You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Abdul Navaz <na...@gmail.com> on 2014/09/26 16:37:09 UTC

No space when running a hadoop job

Hi

I am facing some space issue when I saving file into HDFS and/or running map
reduce job.

root@nn:~# df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.9G     0 100% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  192K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

overflow                                         1.0M  4.0K 1020K   1% /tmp

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
/proj/ch-geni-net

root@nn:~# 



I can see there is no space left on /dev/xvda2.

How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move
the file manually from /dev/xvda2 to xvda4 ?



Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  Œ/
Œ  and if it is full it throws an error instead of copying to  Œ/mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>   
 
 



Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  Œ/
Œ  and if it is full it throws an error instead of copying to  Œ/mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>   
 
 



Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  Œ/
Œ  and if it is full it throws an error instead of copying to  Œ/mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>   
 
 



Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  Œ/
Œ  and if it is full it throws an error instead of copying to  Œ/mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>   
 
 



Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>


Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>


Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>


Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>


Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Hello,

As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>

<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.

                For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>

</property>



hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  258M  96% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 



Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.

Once /dev/xvda2 is full I am getting the below error message.

hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt

Warning: $HADOOP_HOME is deprecated.



14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)




Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?




Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Abdul Navaz <na...@gmail.com>
Date:  Monday, September 29, 2014 at 1:53 PM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Hello,

As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>

<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.

                For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>

</property>



hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  258M  96% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 



Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.

Once /dev/xvda2 is full I am getting the below error message.

hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt

Warning: $HADOOP_HOME is deprecated.



14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)




Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?




Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Abdul Navaz <na...@gmail.com>
Date:  Monday, September 29, 2014 at 1:53 PM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Hello,

As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>

<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.

                For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>

</property>



hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  258M  96% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 



Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.

Once /dev/xvda2 is full I am getting the below error message.

hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt

Warning: $HADOOP_HOME is deprecated.



14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)




Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?




Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Abdul Navaz <na...@gmail.com>
Date:  Monday, September 29, 2014 at 1:53 PM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Hello,

As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>

<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.

                For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>

</property>



hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  258M  96% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 



Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.

Once /dev/xvda2 is full I am getting the below error message.

hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt

Warning: $HADOOP_HOME is deprecated.



14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)




Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?




Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Abdul Navaz <na...@gmail.com>
Date:  Monday, September 29, 2014 at 1:53 PM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.
Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >




Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to “dfs.datanode.data.dir” will not balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced. If you
> want
> > to balance them right after adding the new disk and changing the
> > “dfs.datanode.data.dir”
> > value, you have to shutdown the DN and manually move (mv) some files in
> the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes, but it
> won't
> > care about "internal" node disks utilization. For your particular case,
> the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io speed.
> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <
> skgadalay@gmail.com>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> “dfs.datanode.data.dir”
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem                                       Size  Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2                                       5.9G  5.9G     0
> >> 100%
> >> >> >> /
> >> >> >> udev                                              98M  4.0K   98M
> >>  1%
> >> >> >> /dev
> >> >> >> tmpfs                                             48M  192K   48M
> >>  1%
> >> >> >> /run
> >> >> >> none                                             5.0M     0  5.0M
> >>  0%
> >> >> >> /run/lock
> >> >> >> none                                             120M     0  120M
> >>  0%
> >> >> >> /run/shm
> >> >> >> overflow                                         1.0M  4.0K 1020K
> >>  1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4                                       7.9G  147M  7.4G
> >>  2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to “dfs.datanode.data.dir” will not balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced. If you
> want
> > to balance them right after adding the new disk and changing the
> > “dfs.datanode.data.dir”
> > value, you have to shutdown the DN and manually move (mv) some files in
> the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes, but it
> won't
> > care about "internal" node disks utilization. For your particular case,
> the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io speed.
> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <
> skgadalay@gmail.com>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> “dfs.datanode.data.dir”
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem                                       Size  Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2                                       5.9G  5.9G     0
> >> 100%
> >> >> >> /
> >> >> >> udev                                              98M  4.0K   98M
> >>  1%
> >> >> >> /dev
> >> >> >> tmpfs                                             48M  192K   48M
> >>  1%
> >> >> >> /run
> >> >> >> none                                             5.0M     0  5.0M
> >>  0%
> >> >> >> /run/lock
> >> >> >> none                                             120M     0  120M
> >>  0%
> >> >> >> /run/shm
> >> >> >> overflow                                         1.0M  4.0K 1020K
> >>  1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4                                       7.9G  147M  7.4G
> >>  2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to “dfs.datanode.data.dir” will not balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced. If you
> want
> > to balance them right after adding the new disk and changing the
> > “dfs.datanode.data.dir”
> > value, you have to shutdown the DN and manually move (mv) some files in
> the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes, but it
> won't
> > care about "internal" node disks utilization. For your particular case,
> the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io speed.
> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <
> skgadalay@gmail.com>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> “dfs.datanode.data.dir”
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem                                       Size  Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2                                       5.9G  5.9G     0
> >> 100%
> >> >> >> /
> >> >> >> udev                                              98M  4.0K   98M
> >>  1%
> >> >> >> /dev
> >> >> >> tmpfs                                             48M  192K   48M
> >>  1%
> >> >> >> /run
> >> >> >> none                                             5.0M     0  5.0M
> >>  0%
> >> >> >> /run/lock
> >> >> >> none                                             120M     0  120M
> >>  0%
> >> >> >> /run/shm
> >> >> >> overflow                                         1.0M  4.0K 1020K
> >>  1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4                                       7.9G  147M  7.4G
> >>  2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to “dfs.datanode.data.dir” will not balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced. If you
> want
> > to balance them right after adding the new disk and changing the
> > “dfs.datanode.data.dir”
> > value, you have to shutdown the DN and manually move (mv) some files in
> the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes, but it
> won't
> > care about "internal" node disks utilization. For your particular case,
> the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io speed.
> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <
> skgadalay@gmail.com>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> “dfs.datanode.data.dir”
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem                                       Size  Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2                                       5.9G  5.9G     0
> >> 100%
> >> >> >> /
> >> >> >> udev                                              98M  4.0K   98M
> >>  1%
> >> >> >> /dev
> >> >> >> tmpfs                                             48M  192K   48M
> >>  1%
> >> >> >> /run
> >> >> >> none                                             5.0M     0  5.0M
> >>  0%
> >> >> >> /run/lock
> >> >> >> none                                             120M     0  120M
> >>  0%
> >> >> >> /run/shm
> >> >> >> overflow                                         1.0M  4.0K 1020K
> >>  1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4                                       7.9G  147M  7.4G
> >>  2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Thank Aitor.

That is what is my observation too.

I added a new disk location and manually moved some files.

But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.

On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> Hi Susheel,
>
> Adding a new directory to “dfs.datanode.data.dir” will not balance your
> disks straightforward. Eventually, by HDFS activity (deleting/invalidating
> some block, writing new ones), the disks will become balanced. If you want
> to balance them right after adding the new disk and changing the
> “dfs.datanode.data.dir”
> value, you have to shutdown the DN and manually move (mv) some files in the
> old directory to the new one.
>
> The balancer will try to balance the usage between HDFS nodes, but it won't
> care about "internal" node disks utilization. For your particular case, the
> balancer won't fix your issue.
>
> Hope it helps,
> Aitor
>
> On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
>
>> You mean if multiple directory locations are given, Hadoop will
>> balance the distribution of files across these different directories.
>>
>> But normally we start with 1 directory location and once it is
>> reaching the maximum, we add new directory.
>>
>> In this case how can we balance the distribution of files?
>>
>> One way is to list the files and move.
>>
>> Will start balance script will work?
>>
>> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>> > It can read/write in parallel to all drives. More hdd more io speed.
>> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
>> > wrote:
>> >
>> >> Correct me if I am wrong.
>> >>
>> >> Adding multiple directories will not balance the files distributions
>> >> across these locations.
>> >>
>> >> Hadoop will add exhaust the first directory and then start using the
>> >> next, next ..
>> >>
>> >> How can I tell Hadoop to evenly balance across these directories.
>> >>
>> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> >> > You can add a comma separated list of paths to the
>> >> “dfs.datanode.data.dir”
>> >> > property in your hdfs-site.xml
>> >> >
>> >> > mn
>> >> >
>> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am facing some space issue when I saving file into HDFS and/or
>> >> >> running
>> >> >> map reduce job.
>> >> >>
>> >> >> root@nn:~# df -h
>> >> >> Filesystem                                       Size  Used Avail
>> Use%
>> >> >> Mounted on
>> >> >> /dev/xvda2                                       5.9G  5.9G     0
>> 100%
>> >> >> /
>> >> >> udev                                              98M  4.0K   98M
>>  1%
>> >> >> /dev
>> >> >> tmpfs                                             48M  192K   48M
>>  1%
>> >> >> /run
>> >> >> none                                             5.0M     0  5.0M
>>  0%
>> >> >> /run/lock
>> >> >> none                                             120M     0  120M
>>  0%
>> >> >> /run/shm
>> >> >> overflow                                         1.0M  4.0K 1020K
>>  1%
>> >> >> /tmp
>> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>>  2%
>> >> >> /mnt
>> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
>> 59%
>> >> >> /groups/ch-geni-net/Hadoop-NET
>> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
>> 59%
>> >> >> /proj/ch-geni-net
>> >> >> root@nn:~#
>> >> >>
>> >> >>
>> >> >> I can see there is no space left on /dev/xvda2.
>> >> >>
>> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>> >> >> need
>> >> >> to
>> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks & Regards,
>> >> >>
>> >> >> Abdul Navaz
>> >> >> Research Assistant
>> >> >> University of Houston Main Campus, Houston TX
>> >> >> Ph: 281-685-0388
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Thank Aitor.

That is what is my observation too.

I added a new disk location and manually moved some files.

But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.

On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> Hi Susheel,
>
> Adding a new directory to “dfs.datanode.data.dir” will not balance your
> disks straightforward. Eventually, by HDFS activity (deleting/invalidating
> some block, writing new ones), the disks will become balanced. If you want
> to balance them right after adding the new disk and changing the
> “dfs.datanode.data.dir”
> value, you have to shutdown the DN and manually move (mv) some files in the
> old directory to the new one.
>
> The balancer will try to balance the usage between HDFS nodes, but it won't
> care about "internal" node disks utilization. For your particular case, the
> balancer won't fix your issue.
>
> Hope it helps,
> Aitor
>
> On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
>
>> You mean if multiple directory locations are given, Hadoop will
>> balance the distribution of files across these different directories.
>>
>> But normally we start with 1 directory location and once it is
>> reaching the maximum, we add new directory.
>>
>> In this case how can we balance the distribution of files?
>>
>> One way is to list the files and move.
>>
>> Will start balance script will work?
>>
>> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>> > It can read/write in parallel to all drives. More hdd more io speed.
>> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
>> > wrote:
>> >
>> >> Correct me if I am wrong.
>> >>
>> >> Adding multiple directories will not balance the files distributions
>> >> across these locations.
>> >>
>> >> Hadoop will add exhaust the first directory and then start using the
>> >> next, next ..
>> >>
>> >> How can I tell Hadoop to evenly balance across these directories.
>> >>
>> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> >> > You can add a comma separated list of paths to the
>> >> “dfs.datanode.data.dir”
>> >> > property in your hdfs-site.xml
>> >> >
>> >> > mn
>> >> >
>> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am facing some space issue when I saving file into HDFS and/or
>> >> >> running
>> >> >> map reduce job.
>> >> >>
>> >> >> root@nn:~# df -h
>> >> >> Filesystem                                       Size  Used Avail
>> Use%
>> >> >> Mounted on
>> >> >> /dev/xvda2                                       5.9G  5.9G     0
>> 100%
>> >> >> /
>> >> >> udev                                              98M  4.0K   98M
>>  1%
>> >> >> /dev
>> >> >> tmpfs                                             48M  192K   48M
>>  1%
>> >> >> /run
>> >> >> none                                             5.0M     0  5.0M
>>  0%
>> >> >> /run/lock
>> >> >> none                                             120M     0  120M
>>  0%
>> >> >> /run/shm
>> >> >> overflow                                         1.0M  4.0K 1020K
>>  1%
>> >> >> /tmp
>> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>>  2%
>> >> >> /mnt
>> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
>> 59%
>> >> >> /groups/ch-geni-net/Hadoop-NET
>> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
>> 59%
>> >> >> /proj/ch-geni-net
>> >> >> root@nn:~#
>> >> >>
>> >> >>
>> >> >> I can see there is no space left on /dev/xvda2.
>> >> >>
>> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>> >> >> need
>> >> >> to
>> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks & Regards,
>> >> >>
>> >> >> Abdul Navaz
>> >> >> Research Assistant
>> >> >> University of Houston Main Campus, Houston TX
>> >> >> Ph: 281-685-0388
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Thank Aitor.

That is what is my observation too.

I added a new disk location and manually moved some files.

But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.

On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> Hi Susheel,
>
> Adding a new directory to “dfs.datanode.data.dir” will not balance your
> disks straightforward. Eventually, by HDFS activity (deleting/invalidating
> some block, writing new ones), the disks will become balanced. If you want
> to balance them right after adding the new disk and changing the
> “dfs.datanode.data.dir”
> value, you have to shutdown the DN and manually move (mv) some files in the
> old directory to the new one.
>
> The balancer will try to balance the usage between HDFS nodes, but it won't
> care about "internal" node disks utilization. For your particular case, the
> balancer won't fix your issue.
>
> Hope it helps,
> Aitor
>
> On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
>
>> You mean if multiple directory locations are given, Hadoop will
>> balance the distribution of files across these different directories.
>>
>> But normally we start with 1 directory location and once it is
>> reaching the maximum, we add new directory.
>>
>> In this case how can we balance the distribution of files?
>>
>> One way is to list the files and move.
>>
>> Will start balance script will work?
>>
>> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>> > It can read/write in parallel to all drives. More hdd more io speed.
>> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
>> > wrote:
>> >
>> >> Correct me if I am wrong.
>> >>
>> >> Adding multiple directories will not balance the files distributions
>> >> across these locations.
>> >>
>> >> Hadoop will add exhaust the first directory and then start using the
>> >> next, next ..
>> >>
>> >> How can I tell Hadoop to evenly balance across these directories.
>> >>
>> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> >> > You can add a comma separated list of paths to the
>> >> “dfs.datanode.data.dir”
>> >> > property in your hdfs-site.xml
>> >> >
>> >> > mn
>> >> >
>> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am facing some space issue when I saving file into HDFS and/or
>> >> >> running
>> >> >> map reduce job.
>> >> >>
>> >> >> root@nn:~# df -h
>> >> >> Filesystem                                       Size  Used Avail
>> Use%
>> >> >> Mounted on
>> >> >> /dev/xvda2                                       5.9G  5.9G     0
>> 100%
>> >> >> /
>> >> >> udev                                              98M  4.0K   98M
>>  1%
>> >> >> /dev
>> >> >> tmpfs                                             48M  192K   48M
>>  1%
>> >> >> /run
>> >> >> none                                             5.0M     0  5.0M
>>  0%
>> >> >> /run/lock
>> >> >> none                                             120M     0  120M
>>  0%
>> >> >> /run/shm
>> >> >> overflow                                         1.0M  4.0K 1020K
>>  1%
>> >> >> /tmp
>> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>>  2%
>> >> >> /mnt
>> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
>> 59%
>> >> >> /groups/ch-geni-net/Hadoop-NET
>> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
>> 59%
>> >> >> /proj/ch-geni-net
>> >> >> root@nn:~#
>> >> >>
>> >> >>
>> >> >> I can see there is no space left on /dev/xvda2.
>> >> >>
>> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>> >> >> need
>> >> >> to
>> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks & Regards,
>> >> >>
>> >> >> Abdul Navaz
>> >> >> Research Assistant
>> >> >> University of Houston Main Campus, Houston TX
>> >> >> Ph: 281-685-0388
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Thank Aitor.

That is what is my observation too.

I added a new disk location and manually moved some files.

But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.

On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
> Hi Susheel,
>
> Adding a new directory to “dfs.datanode.data.dir” will not balance your
> disks straightforward. Eventually, by HDFS activity (deleting/invalidating
> some block, writing new ones), the disks will become balanced. If you want
> to balance them right after adding the new disk and changing the
> “dfs.datanode.data.dir”
> value, you have to shutdown the DN and manually move (mv) some files in the
> old directory to the new one.
>
> The balancer will try to balance the usage between HDFS nodes, but it won't
> care about "internal" node disks utilization. For your particular case, the
> balancer won't fix your issue.
>
> Hope it helps,
> Aitor
>
> On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
>
>> You mean if multiple directory locations are given, Hadoop will
>> balance the distribution of files across these different directories.
>>
>> But normally we start with 1 directory location and once it is
>> reaching the maximum, we add new directory.
>>
>> In this case how can we balance the distribution of files?
>>
>> One way is to list the files and move.
>>
>> Will start balance script will work?
>>
>> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>> > It can read/write in parallel to all drives. More hdd more io speed.
>> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
>> > wrote:
>> >
>> >> Correct me if I am wrong.
>> >>
>> >> Adding multiple directories will not balance the files distributions
>> >> across these locations.
>> >>
>> >> Hadoop will add exhaust the first directory and then start using the
>> >> next, next ..
>> >>
>> >> How can I tell Hadoop to evenly balance across these directories.
>> >>
>> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> >> > You can add a comma separated list of paths to the
>> >> “dfs.datanode.data.dir”
>> >> > property in your hdfs-site.xml
>> >> >
>> >> > mn
>> >> >
>> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am facing some space issue when I saving file into HDFS and/or
>> >> >> running
>> >> >> map reduce job.
>> >> >>
>> >> >> root@nn:~# df -h
>> >> >> Filesystem                                       Size  Used Avail
>> Use%
>> >> >> Mounted on
>> >> >> /dev/xvda2                                       5.9G  5.9G     0
>> 100%
>> >> >> /
>> >> >> udev                                              98M  4.0K   98M
>>  1%
>> >> >> /dev
>> >> >> tmpfs                                             48M  192K   48M
>>  1%
>> >> >> /run
>> >> >> none                                             5.0M     0  5.0M
>>  0%
>> >> >> /run/lock
>> >> >> none                                             120M     0  120M
>>  0%
>> >> >> /run/shm
>> >> >> overflow                                         1.0M  4.0K 1020K
>>  1%
>> >> >> /tmp
>> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>>  2%
>> >> >> /mnt
>> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
>> 59%
>> >> >> /groups/ch-geni-net/Hadoop-NET
>> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
>> 59%
>> >> >> /proj/ch-geni-net
>> >> >> root@nn:~#
>> >> >>
>> >> >>
>> >> >> I can see there is no space left on /dev/xvda2.
>> >> >>
>> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>> >> >> need
>> >> >> to
>> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks & Regards,
>> >> >>
>> >> >> Abdul Navaz
>> >> >> Research Assistant
>> >> >> University of Houston Main Campus, Houston TX
>> >> >> Ph: 281-685-0388
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Susheel,

Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor

On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> You mean if multiple directory locations are given, Hadoop will
> balance the distribution of files across these different directories.
>
> But normally we start with 1 directory location and once it is
> reaching the maximum, we add new directory.
>
> In this case how can we balance the distribution of files?
>
> One way is to list the files and move.
>
> Will start balance script will work?
>
> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> > It can read/write in parallel to all drives. More hdd more io speed.
> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> > wrote:
> >
> >> Correct me if I am wrong.
> >>
> >> Adding multiple directories will not balance the files distributions
> >> across these locations.
> >>
> >> Hadoop will add exhaust the first directory and then start using the
> >> next, next ..
> >>
> >> How can I tell Hadoop to evenly balance across these directories.
> >>
> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> > You can add a comma separated list of paths to the
> >> “dfs.datanode.data.dir”
> >> > property in your hdfs-site.xml
> >> >
> >> > mn
> >> >
> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> running
> >> >> map reduce job.
> >> >>
> >> >> root@nn:~# df -h
> >> >> Filesystem                                       Size  Used Avail
> Use%
> >> >> Mounted on
> >> >> /dev/xvda2                                       5.9G  5.9G     0
> 100%
> >> >> /
> >> >> udev                                              98M  4.0K   98M
>  1%
> >> >> /dev
> >> >> tmpfs                                             48M  192K   48M
>  1%
> >> >> /run
> >> >> none                                             5.0M     0  5.0M
>  0%
> >> >> /run/lock
> >> >> none                                             120M     0  120M
>  0%
> >> >> /run/shm
> >> >> overflow                                         1.0M  4.0K 1020K
>  1%
> >> >> /tmp
> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>  2%
> >> >> /mnt
> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> 59%
> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> 59%
> >> >> /proj/ch-geni-net
> >> >> root@nn:~#
> >> >>
> >> >>
> >> >> I can see there is no space left on /dev/xvda2.
> >> >>
> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
> >> >> to
> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >>
> >> >>
> >> >>
> >> >> Thanks & Regards,
> >> >>
> >> >> Abdul Navaz
> >> >> Research Assistant
> >> >> University of Houston Main Campus, Houston TX
> >> >> Ph: 281-685-0388
> >> >>
> >> >
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Susheel,

Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor

On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> You mean if multiple directory locations are given, Hadoop will
> balance the distribution of files across these different directories.
>
> But normally we start with 1 directory location and once it is
> reaching the maximum, we add new directory.
>
> In this case how can we balance the distribution of files?
>
> One way is to list the files and move.
>
> Will start balance script will work?
>
> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> > It can read/write in parallel to all drives. More hdd more io speed.
> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> > wrote:
> >
> >> Correct me if I am wrong.
> >>
> >> Adding multiple directories will not balance the files distributions
> >> across these locations.
> >>
> >> Hadoop will add exhaust the first directory and then start using the
> >> next, next ..
> >>
> >> How can I tell Hadoop to evenly balance across these directories.
> >>
> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> > You can add a comma separated list of paths to the
> >> “dfs.datanode.data.dir”
> >> > property in your hdfs-site.xml
> >> >
> >> > mn
> >> >
> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> running
> >> >> map reduce job.
> >> >>
> >> >> root@nn:~# df -h
> >> >> Filesystem                                       Size  Used Avail
> Use%
> >> >> Mounted on
> >> >> /dev/xvda2                                       5.9G  5.9G     0
> 100%
> >> >> /
> >> >> udev                                              98M  4.0K   98M
>  1%
> >> >> /dev
> >> >> tmpfs                                             48M  192K   48M
>  1%
> >> >> /run
> >> >> none                                             5.0M     0  5.0M
>  0%
> >> >> /run/lock
> >> >> none                                             120M     0  120M
>  0%
> >> >> /run/shm
> >> >> overflow                                         1.0M  4.0K 1020K
>  1%
> >> >> /tmp
> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>  2%
> >> >> /mnt
> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> 59%
> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> 59%
> >> >> /proj/ch-geni-net
> >> >> root@nn:~#
> >> >>
> >> >>
> >> >> I can see there is no space left on /dev/xvda2.
> >> >>
> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
> >> >> to
> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >>
> >> >>
> >> >>
> >> >> Thanks & Regards,
> >> >>
> >> >> Abdul Navaz
> >> >> Research Assistant
> >> >> University of Houston Main Campus, Houston TX
> >> >> Ph: 281-685-0388
> >> >>
> >> >
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Susheel,

Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor

On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> You mean if multiple directory locations are given, Hadoop will
> balance the distribution of files across these different directories.
>
> But normally we start with 1 directory location and once it is
> reaching the maximum, we add new directory.
>
> In this case how can we balance the distribution of files?
>
> One way is to list the files and move.
>
> Will start balance script will work?
>
> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> > It can read/write in parallel to all drives. More hdd more io speed.
> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> > wrote:
> >
> >> Correct me if I am wrong.
> >>
> >> Adding multiple directories will not balance the files distributions
> >> across these locations.
> >>
> >> Hadoop will add exhaust the first directory and then start using the
> >> next, next ..
> >>
> >> How can I tell Hadoop to evenly balance across these directories.
> >>
> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> > You can add a comma separated list of paths to the
> >> “dfs.datanode.data.dir”
> >> > property in your hdfs-site.xml
> >> >
> >> > mn
> >> >
> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> running
> >> >> map reduce job.
> >> >>
> >> >> root@nn:~# df -h
> >> >> Filesystem                                       Size  Used Avail
> Use%
> >> >> Mounted on
> >> >> /dev/xvda2                                       5.9G  5.9G     0
> 100%
> >> >> /
> >> >> udev                                              98M  4.0K   98M
>  1%
> >> >> /dev
> >> >> tmpfs                                             48M  192K   48M
>  1%
> >> >> /run
> >> >> none                                             5.0M     0  5.0M
>  0%
> >> >> /run/lock
> >> >> none                                             120M     0  120M
>  0%
> >> >> /run/shm
> >> >> overflow                                         1.0M  4.0K 1020K
>  1%
> >> >> /tmp
> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>  2%
> >> >> /mnt
> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> 59%
> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> 59%
> >> >> /proj/ch-geni-net
> >> >> root@nn:~#
> >> >>
> >> >>
> >> >> I can see there is no space left on /dev/xvda2.
> >> >>
> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
> >> >> to
> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >>
> >> >>
> >> >>
> >> >> Thanks & Regards,
> >> >>
> >> >> Abdul Navaz
> >> >> Research Assistant
> >> >> University of Houston Main Campus, Houston TX
> >> >> Ph: 281-685-0388
> >> >>
> >> >
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Susheel,

Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor

On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> You mean if multiple directory locations are given, Hadoop will
> balance the distribution of files across these different directories.
>
> But normally we start with 1 directory location and once it is
> reaching the maximum, we add new directory.
>
> In this case how can we balance the distribution of files?
>
> One way is to list the files and move.
>
> Will start balance script will work?
>
> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> > It can read/write in parallel to all drives. More hdd more io speed.
> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> > wrote:
> >
> >> Correct me if I am wrong.
> >>
> >> Adding multiple directories will not balance the files distributions
> >> across these locations.
> >>
> >> Hadoop will add exhaust the first directory and then start using the
> >> next, next ..
> >>
> >> How can I tell Hadoop to evenly balance across these directories.
> >>
> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> >> > You can add a comma separated list of paths to the
> >> “dfs.datanode.data.dir”
> >> > property in your hdfs-site.xml
> >> >
> >> > mn
> >> >
> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am facing some space issue when I saving file into HDFS and/or
> >> >> running
> >> >> map reduce job.
> >> >>
> >> >> root@nn:~# df -h
> >> >> Filesystem                                       Size  Used Avail
> Use%
> >> >> Mounted on
> >> >> /dev/xvda2                                       5.9G  5.9G     0
> 100%
> >> >> /
> >> >> udev                                              98M  4.0K   98M
>  1%
> >> >> /dev
> >> >> tmpfs                                             48M  192K   48M
>  1%
> >> >> /run
> >> >> none                                             5.0M     0  5.0M
>  0%
> >> >> /run/lock
> >> >> none                                             120M     0  120M
>  0%
> >> >> /run/shm
> >> >> overflow                                         1.0M  4.0K 1020K
>  1%
> >> >> /tmp
> >> >> /dev/xvda4                                       7.9G  147M  7.4G
>  2%
> >> >> /mnt
> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G
> 59%
> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G
> 59%
> >> >> /proj/ch-geni-net
> >> >> root@nn:~#
> >> >>
> >> >>
> >> >> I can see there is no space left on /dev/xvda2.
> >> >>
> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
> >> >> to
> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >>
> >> >>
> >> >>
> >> >> Thanks & Regards,
> >> >>
> >> >> Abdul Navaz
> >> >> Research Assistant
> >> >> University of Houston Main Campus, Houston TX
> >> >> Ph: 281-685-0388
> >> >>
> >> >
> >> >
> >>
> >
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?

On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> It can read/write in parallel to all drives. More hdd more io speed.
>  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> wrote:
>
>> Correct me if I am wrong.
>>
>> Adding multiple directories will not balance the files distributions
>> across these locations.
>>
>> Hadoop will add exhaust the first directory and then start using the
>> next, next ..
>>
>> How can I tell Hadoop to evenly balance across these directories.
>>
>> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> > You can add a comma separated list of paths to the
>> “dfs.datanode.data.dir”
>> > property in your hdfs-site.xml
>> >
>> > mn
>> >
>> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I am facing some space issue when I saving file into HDFS and/or
>> >> running
>> >> map reduce job.
>> >>
>> >> root@nn:~# df -h
>> >> Filesystem                                       Size  Used Avail Use%
>> >> Mounted on
>> >> /dev/xvda2                                       5.9G  5.9G     0 100%
>> >> /
>> >> udev                                              98M  4.0K   98M   1%
>> >> /dev
>> >> tmpfs                                             48M  192K   48M   1%
>> >> /run
>> >> none                                             5.0M     0  5.0M   0%
>> >> /run/lock
>> >> none                                             120M     0  120M   0%
>> >> /run/shm
>> >> overflow                                         1.0M  4.0K 1020K   1%
>> >> /tmp
>> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> >> /mnt
>> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> >> /groups/ch-geni-net/Hadoop-NET
>> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> >> /proj/ch-geni-net
>> >> root@nn:~#
>> >>
>> >>
>> >> I can see there is no space left on /dev/xvda2.
>> >>
>> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
>> >> to
>> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >>
>> >>
>> >>
>> >> Thanks & Regards,
>> >>
>> >> Abdul Navaz
>> >> Research Assistant
>> >> University of Houston Main Campus, Houston TX
>> >> Ph: 281-685-0388
>> >>
>> >
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?

On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> It can read/write in parallel to all drives. More hdd more io speed.
>  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> wrote:
>
>> Correct me if I am wrong.
>>
>> Adding multiple directories will not balance the files distributions
>> across these locations.
>>
>> Hadoop will add exhaust the first directory and then start using the
>> next, next ..
>>
>> How can I tell Hadoop to evenly balance across these directories.
>>
>> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> > You can add a comma separated list of paths to the
>> “dfs.datanode.data.dir”
>> > property in your hdfs-site.xml
>> >
>> > mn
>> >
>> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I am facing some space issue when I saving file into HDFS and/or
>> >> running
>> >> map reduce job.
>> >>
>> >> root@nn:~# df -h
>> >> Filesystem                                       Size  Used Avail Use%
>> >> Mounted on
>> >> /dev/xvda2                                       5.9G  5.9G     0 100%
>> >> /
>> >> udev                                              98M  4.0K   98M   1%
>> >> /dev
>> >> tmpfs                                             48M  192K   48M   1%
>> >> /run
>> >> none                                             5.0M     0  5.0M   0%
>> >> /run/lock
>> >> none                                             120M     0  120M   0%
>> >> /run/shm
>> >> overflow                                         1.0M  4.0K 1020K   1%
>> >> /tmp
>> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> >> /mnt
>> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> >> /groups/ch-geni-net/Hadoop-NET
>> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> >> /proj/ch-geni-net
>> >> root@nn:~#
>> >>
>> >>
>> >> I can see there is no space left on /dev/xvda2.
>> >>
>> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
>> >> to
>> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >>
>> >>
>> >>
>> >> Thanks & Regards,
>> >>
>> >> Abdul Navaz
>> >> Research Assistant
>> >> University of Houston Main Campus, Houston TX
>> >> Ph: 281-685-0388
>> >>
>> >
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?

On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> It can read/write in parallel to all drives. More hdd more io speed.
>  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> wrote:
>
>> Correct me if I am wrong.
>>
>> Adding multiple directories will not balance the files distributions
>> across these locations.
>>
>> Hadoop will add exhaust the first directory and then start using the
>> next, next ..
>>
>> How can I tell Hadoop to evenly balance across these directories.
>>
>> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> > You can add a comma separated list of paths to the
>> “dfs.datanode.data.dir”
>> > property in your hdfs-site.xml
>> >
>> > mn
>> >
>> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I am facing some space issue when I saving file into HDFS and/or
>> >> running
>> >> map reduce job.
>> >>
>> >> root@nn:~# df -h
>> >> Filesystem                                       Size  Used Avail Use%
>> >> Mounted on
>> >> /dev/xvda2                                       5.9G  5.9G     0 100%
>> >> /
>> >> udev                                              98M  4.0K   98M   1%
>> >> /dev
>> >> tmpfs                                             48M  192K   48M   1%
>> >> /run
>> >> none                                             5.0M     0  5.0M   0%
>> >> /run/lock
>> >> none                                             120M     0  120M   0%
>> >> /run/shm
>> >> overflow                                         1.0M  4.0K 1020K   1%
>> >> /tmp
>> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> >> /mnt
>> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> >> /groups/ch-geni-net/Hadoop-NET
>> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> >> /proj/ch-geni-net
>> >> root@nn:~#
>> >>
>> >>
>> >> I can see there is no space left on /dev/xvda2.
>> >>
>> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
>> >> to
>> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >>
>> >>
>> >>
>> >> Thanks & Regards,
>> >>
>> >> Abdul Navaz
>> >> Research Assistant
>> >> University of Houston Main Campus, Houston TX
>> >> Ph: 281-685-0388
>> >>
>> >
>> >
>>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?

On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
> It can read/write in parallel to all drives. More hdd more io speed.
>  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
> wrote:
>
>> Correct me if I am wrong.
>>
>> Adding multiple directories will not balance the files distributions
>> across these locations.
>>
>> Hadoop will add exhaust the first directory and then start using the
>> next, next ..
>>
>> How can I tell Hadoop to evenly balance across these directories.
>>
>> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>> > You can add a comma separated list of paths to the
>> “dfs.datanode.data.dir”
>> > property in your hdfs-site.xml
>> >
>> > mn
>> >
>> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I am facing some space issue when I saving file into HDFS and/or
>> >> running
>> >> map reduce job.
>> >>
>> >> root@nn:~# df -h
>> >> Filesystem                                       Size  Used Avail Use%
>> >> Mounted on
>> >> /dev/xvda2                                       5.9G  5.9G     0 100%
>> >> /
>> >> udev                                              98M  4.0K   98M   1%
>> >> /dev
>> >> tmpfs                                             48M  192K   48M   1%
>> >> /run
>> >> none                                             5.0M     0  5.0M   0%
>> >> /run/lock
>> >> none                                             120M     0  120M   0%
>> >> /run/shm
>> >> overflow                                         1.0M  4.0K 1020K   1%
>> >> /tmp
>> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> >> /mnt
>> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> >> /groups/ch-geni-net/Hadoop-NET
>> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> >> /proj/ch-geni-net
>> >> root@nn:~#
>> >>
>> >>
>> >> I can see there is no space left on /dev/xvda2.
>> >>
>> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
>> >> to
>> >> move the file manually from /dev/xvda2 to xvda4 ?
>> >>
>> >>
>> >>
>> >> Thanks & Regards,
>> >>
>> >> Abdul Navaz
>> >> Research Assistant
>> >> University of Houston Main Campus, Houston TX
>> >> Ph: 281-685-0388
>> >>
>> >
>> >
>>
>

Re: No space when running a hadoop job

Posted by Alexander Pivovarov <ap...@gmail.com>.
It can read/write in parallel to all drives. More hdd more io speed.
 On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
wrote:

> Correct me if I am wrong.
>
> Adding multiple directories will not balance the files distributions
> across these locations.
>
> Hadoop will add exhaust the first directory and then start using the
> next, next ..
>
> How can I tell Hadoop to evenly balance across these directories.
>
> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> > You can add a comma separated list of paths to the
> “dfs.datanode.data.dir”
> > property in your hdfs-site.xml
> >
> > mn
> >
> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >
> >> Hi
> >>
> >> I am facing some space issue when I saving file into HDFS and/or running
> >> map reduce job.
> >>
> >> root@nn:~# df -h
> >> Filesystem                                       Size  Used Avail Use%
> >> Mounted on
> >> /dev/xvda2                                       5.9G  5.9G     0 100% /
> >> udev                                              98M  4.0K   98M   1%
> >> /dev
> >> tmpfs                                             48M  192K   48M   1%
> >> /run
> >> none                                             5.0M     0  5.0M   0%
> >> /run/lock
> >> none                                             120M     0  120M   0%
> >> /run/shm
> >> overflow                                         1.0M  4.0K 1020K   1%
> >> /tmp
> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
> >> /mnt
> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
> >> /groups/ch-geni-net/Hadoop-NET
> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
> >> /proj/ch-geni-net
> >> root@nn:~#
> >>
> >>
> >> I can see there is no space left on /dev/xvda2.
> >>
> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
> >> move the file manually from /dev/xvda2 to xvda4 ?
> >>
> >>
> >>
> >> Thanks & Regards,
> >>
> >> Abdul Navaz
> >> Research Assistant
> >> University of Houston Main Campus, Houston TX
> >> Ph: 281-685-0388
> >>
> >
> >
>

Re: No space when running a hadoop job

Posted by Alexander Pivovarov <ap...@gmail.com>.
It can read/write in parallel to all drives. More hdd more io speed.
 On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
wrote:

> Correct me if I am wrong.
>
> Adding multiple directories will not balance the files distributions
> across these locations.
>
> Hadoop will add exhaust the first directory and then start using the
> next, next ..
>
> How can I tell Hadoop to evenly balance across these directories.
>
> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> > You can add a comma separated list of paths to the
> “dfs.datanode.data.dir”
> > property in your hdfs-site.xml
> >
> > mn
> >
> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >
> >> Hi
> >>
> >> I am facing some space issue when I saving file into HDFS and/or running
> >> map reduce job.
> >>
> >> root@nn:~# df -h
> >> Filesystem                                       Size  Used Avail Use%
> >> Mounted on
> >> /dev/xvda2                                       5.9G  5.9G     0 100% /
> >> udev                                              98M  4.0K   98M   1%
> >> /dev
> >> tmpfs                                             48M  192K   48M   1%
> >> /run
> >> none                                             5.0M     0  5.0M   0%
> >> /run/lock
> >> none                                             120M     0  120M   0%
> >> /run/shm
> >> overflow                                         1.0M  4.0K 1020K   1%
> >> /tmp
> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
> >> /mnt
> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
> >> /groups/ch-geni-net/Hadoop-NET
> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
> >> /proj/ch-geni-net
> >> root@nn:~#
> >>
> >>
> >> I can see there is no space left on /dev/xvda2.
> >>
> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
> >> move the file manually from /dev/xvda2 to xvda4 ?
> >>
> >>
> >>
> >> Thanks & Regards,
> >>
> >> Abdul Navaz
> >> Research Assistant
> >> University of Houston Main Campus, Houston TX
> >> Ph: 281-685-0388
> >>
> >
> >
>

Re: No space when running a hadoop job

Posted by Alexander Pivovarov <ap...@gmail.com>.
It can read/write in parallel to all drives. More hdd more io speed.
 On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
wrote:

> Correct me if I am wrong.
>
> Adding multiple directories will not balance the files distributions
> across these locations.
>
> Hadoop will add exhaust the first directory and then start using the
> next, next ..
>
> How can I tell Hadoop to evenly balance across these directories.
>
> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> > You can add a comma separated list of paths to the
> “dfs.datanode.data.dir”
> > property in your hdfs-site.xml
> >
> > mn
> >
> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >
> >> Hi
> >>
> >> I am facing some space issue when I saving file into HDFS and/or running
> >> map reduce job.
> >>
> >> root@nn:~# df -h
> >> Filesystem                                       Size  Used Avail Use%
> >> Mounted on
> >> /dev/xvda2                                       5.9G  5.9G     0 100% /
> >> udev                                              98M  4.0K   98M   1%
> >> /dev
> >> tmpfs                                             48M  192K   48M   1%
> >> /run
> >> none                                             5.0M     0  5.0M   0%
> >> /run/lock
> >> none                                             120M     0  120M   0%
> >> /run/shm
> >> overflow                                         1.0M  4.0K 1020K   1%
> >> /tmp
> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
> >> /mnt
> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
> >> /groups/ch-geni-net/Hadoop-NET
> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
> >> /proj/ch-geni-net
> >> root@nn:~#
> >>
> >>
> >> I can see there is no space left on /dev/xvda2.
> >>
> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
> >> move the file manually from /dev/xvda2 to xvda4 ?
> >>
> >>
> >>
> >> Thanks & Regards,
> >>
> >> Abdul Navaz
> >> Research Assistant
> >> University of Houston Main Campus, Houston TX
> >> Ph: 281-685-0388
> >>
> >
> >
>

Re: No space when running a hadoop job

Posted by Alexander Pivovarov <ap...@gmail.com>.
It can read/write in parallel to all drives. More hdd more io speed.
 On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <sk...@gmail.com>
wrote:

> Correct me if I am wrong.
>
> Adding multiple directories will not balance the files distributions
> across these locations.
>
> Hadoop will add exhaust the first directory and then start using the
> next, next ..
>
> How can I tell Hadoop to evenly balance across these directories.
>
> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> > You can add a comma separated list of paths to the
> “dfs.datanode.data.dir”
> > property in your hdfs-site.xml
> >
> > mn
> >
> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
> >
> >> Hi
> >>
> >> I am facing some space issue when I saving file into HDFS and/or running
> >> map reduce job.
> >>
> >> root@nn:~# df -h
> >> Filesystem                                       Size  Used Avail Use%
> >> Mounted on
> >> /dev/xvda2                                       5.9G  5.9G     0 100% /
> >> udev                                              98M  4.0K   98M   1%
> >> /dev
> >> tmpfs                                             48M  192K   48M   1%
> >> /run
> >> none                                             5.0M     0  5.0M   0%
> >> /run/lock
> >> none                                             120M     0  120M   0%
> >> /run/shm
> >> overflow                                         1.0M  4.0K 1020K   1%
> >> /tmp
> >> /dev/xvda4                                       7.9G  147M  7.4G   2%
> >> /mnt
> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
> >> /groups/ch-geni-net/Hadoop-NET
> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
> >> /proj/ch-geni-net
> >> root@nn:~#
> >>
> >>
> >> I can see there is no space left on /dev/xvda2.
> >>
> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
> >> move the file manually from /dev/xvda2 to xvda4 ?
> >>
> >>
> >>
> >> Thanks & Regards,
> >>
> >> Abdul Navaz
> >> Research Assistant
> >> University of Houston Main Campus, Houston TX
> >> Ph: 281-685-0388
> >>
> >
> >
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.

On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> You can add a comma separated list of paths to the “dfs.datanode.data.dir”
> property in your hdfs-site.xml
>
> mn
>
> On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>
>> Hi
>>
>> I am facing some space issue when I saving file into HDFS and/or running
>> map reduce job.
>>
>> root@nn:~# df -h
>> Filesystem                                       Size  Used Avail Use%
>> Mounted on
>> /dev/xvda2                                       5.9G  5.9G     0 100% /
>> udev                                              98M  4.0K   98M   1%
>> /dev
>> tmpfs                                             48M  192K   48M   1%
>> /run
>> none                                             5.0M     0  5.0M   0%
>> /run/lock
>> none                                             120M     0  120M   0%
>> /run/shm
>> overflow                                         1.0M  4.0K 1020K   1%
>> /tmp
>> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> /mnt
>> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> /groups/ch-geni-net/Hadoop-NET
>> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> /proj/ch-geni-net
>> root@nn:~#
>>
>>
>> I can see there is no space left on /dev/xvda2.
>>
>> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
>> move the file manually from /dev/xvda2 to xvda4 ?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.

On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> You can add a comma separated list of paths to the “dfs.datanode.data.dir”
> property in your hdfs-site.xml
>
> mn
>
> On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>
>> Hi
>>
>> I am facing some space issue when I saving file into HDFS and/or running
>> map reduce job.
>>
>> root@nn:~# df -h
>> Filesystem                                       Size  Used Avail Use%
>> Mounted on
>> /dev/xvda2                                       5.9G  5.9G     0 100% /
>> udev                                              98M  4.0K   98M   1%
>> /dev
>> tmpfs                                             48M  192K   48M   1%
>> /run
>> none                                             5.0M     0  5.0M   0%
>> /run/lock
>> none                                             120M     0  120M   0%
>> /run/shm
>> overflow                                         1.0M  4.0K 1020K   1%
>> /tmp
>> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> /mnt
>> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> /groups/ch-geni-net/Hadoop-NET
>> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> /proj/ch-geni-net
>> root@nn:~#
>>
>>
>> I can see there is no space left on /dev/xvda2.
>>
>> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
>> move the file manually from /dev/xvda2 to xvda4 ?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.

On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> You can add a comma separated list of paths to the “dfs.datanode.data.dir”
> property in your hdfs-site.xml
>
> mn
>
> On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>
>> Hi
>>
>> I am facing some space issue when I saving file into HDFS and/or running
>> map reduce job.
>>
>> root@nn:~# df -h
>> Filesystem                                       Size  Used Avail Use%
>> Mounted on
>> /dev/xvda2                                       5.9G  5.9G     0 100% /
>> udev                                              98M  4.0K   98M   1%
>> /dev
>> tmpfs                                             48M  192K   48M   1%
>> /run
>> none                                             5.0M     0  5.0M   0%
>> /run/lock
>> none                                             120M     0  120M   0%
>> /run/shm
>> overflow                                         1.0M  4.0K 1020K   1%
>> /tmp
>> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> /mnt
>> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> /groups/ch-geni-net/Hadoop-NET
>> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> /proj/ch-geni-net
>> root@nn:~#
>>
>>
>> I can see there is no space left on /dev/xvda2.
>>
>> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
>> move the file manually from /dev/xvda2 to xvda4 ?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>

Re: No space when running a hadoop job

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.

On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
> You can add a comma separated list of paths to the “dfs.datanode.data.dir”
> property in your hdfs-site.xml
>
> mn
>
> On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:
>
>> Hi
>>
>> I am facing some space issue when I saving file into HDFS and/or running
>> map reduce job.
>>
>> root@nn:~# df -h
>> Filesystem                                       Size  Used Avail Use%
>> Mounted on
>> /dev/xvda2                                       5.9G  5.9G     0 100% /
>> udev                                              98M  4.0K   98M   1%
>> /dev
>> tmpfs                                             48M  192K   48M   1%
>> /run
>> none                                             5.0M     0  5.0M   0%
>> /run/lock
>> none                                             120M     0  120M   0%
>> /run/shm
>> overflow                                         1.0M  4.0K 1020K   1%
>> /tmp
>> /dev/xvda4                                       7.9G  147M  7.4G   2%
>> /mnt
>> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59%
>> /groups/ch-geni-net/Hadoop-NET
>> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59%
>> /proj/ch-geni-net
>> root@nn:~#
>>
>>
>> I can see there is no space left on /dev/xvda2.
>>
>> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
>> move the file manually from /dev/xvda2 to xvda4 ?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>

Re: No space when running a hadoop job

Posted by Matt Narrell <ma...@gmail.com>.
You can add a comma separated list of paths to the “dfs.datanode.data.dir” property in your hdfs-site.xml

mn

On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:

> Hi
> 
> I am facing some space issue when I saving file into HDFS and/or running map reduce job.
> 
> root@nn:~# df -h
> Filesystem                                       Size  Used Avail Use% Mounted on
> /dev/xvda2                                       5.9G  5.9G     0 100% /
> udev                                              98M  4.0K   98M   1% /dev
> tmpfs                                             48M  192K   48M   1% /run
> none                                             5.0M     0  5.0M   0% /run/lock
> none                                             120M     0  120M   0% /run/shm
> overflow                                         1.0M  4.0K 1020K   1% /tmp
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59% /groups/ch-geni-net/Hadoop-NET
> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59% /proj/ch-geni-net
> root@nn:~# 
> 
> 
> I can see there is no space left on /dev/xvda2.
> 
> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
> 


Re: No space when running a hadoop job

Posted by Matt Narrell <ma...@gmail.com>.
You can add a comma separated list of paths to the “dfs.datanode.data.dir” property in your hdfs-site.xml

mn

On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:

> Hi
> 
> I am facing some space issue when I saving file into HDFS and/or running map reduce job.
> 
> root@nn:~# df -h
> Filesystem                                       Size  Used Avail Use% Mounted on
> /dev/xvda2                                       5.9G  5.9G     0 100% /
> udev                                              98M  4.0K   98M   1% /dev
> tmpfs                                             48M  192K   48M   1% /run
> none                                             5.0M     0  5.0M   0% /run/lock
> none                                             120M     0  120M   0% /run/shm
> overflow                                         1.0M  4.0K 1020K   1% /tmp
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59% /groups/ch-geni-net/Hadoop-NET
> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59% /proj/ch-geni-net
> root@nn:~# 
> 
> 
> I can see there is no space left on /dev/xvda2.
> 
> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
> 


Re: No space when running a hadoop job

Posted by Matt Narrell <ma...@gmail.com>.
You can add a comma separated list of paths to the “dfs.datanode.data.dir” property in your hdfs-site.xml

mn

On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:

> Hi
> 
> I am facing some space issue when I saving file into HDFS and/or running map reduce job.
> 
> root@nn:~# df -h
> Filesystem                                       Size  Used Avail Use% Mounted on
> /dev/xvda2                                       5.9G  5.9G     0 100% /
> udev                                              98M  4.0K   98M   1% /dev
> tmpfs                                             48M  192K   48M   1% /run
> none                                             5.0M     0  5.0M   0% /run/lock
> none                                             120M     0  120M   0% /run/shm
> overflow                                         1.0M  4.0K 1020K   1% /tmp
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59% /groups/ch-geni-net/Hadoop-NET
> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59% /proj/ch-geni-net
> root@nn:~# 
> 
> 
> I can see there is no space left on /dev/xvda2.
> 
> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
> 


Re: No space when running a hadoop job

Posted by Matt Narrell <ma...@gmail.com>.
You can add a comma separated list of paths to the “dfs.datanode.data.dir” property in your hdfs-site.xml

mn

On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com> wrote:

> Hi
> 
> I am facing some space issue when I saving file into HDFS and/or running map reduce job.
> 
> root@nn:~# df -h
> Filesystem                                       Size  Used Avail Use% Mounted on
> /dev/xvda2                                       5.9G  5.9G     0 100% /
> udev                                              98M  4.0K   98M   1% /dev
> tmpfs                                             48M  192K   48M   1% /run
> none                                             5.0M     0  5.0M   0% /run/lock
> none                                             120M     0  120M   0% /run/shm
> overflow                                         1.0M  4.0K 1020K   1% /tmp
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G   75G  59% /groups/ch-geni-net/Hadoop-NET
> 172.17.253.254:/q/proj/ch-geni-net               198G  108G   75G  59% /proj/ch-geni-net
> root@nn:~# 
> 
> 
> I can see there is no space left on /dev/xvda2.
> 
> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>