You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Shashi Vishwakarma <sh...@gmail.com> on 2015/10/31 13:46:38 UTC

Utility to push data into HDFS

Hi

I need build a common utility for unix/windows based system to push data
into hadoop system. User can run that utility from any platform and should
be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

dfs.datanode.data.dir  = /hadoop/hdfs/data,/hdfs/data

Data node 1:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


data node 2:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 24G 27G 48% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 24K 16G 1% /dev/shm 
      tmpfs 16G 97M 16G 1% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 124M 370M 26% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: iain wright 
Sent: Thursday, November 05, 2015 7:56 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Please post:  
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 

Iain Wright



This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.


On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

  Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Adaryl "Bob" Wakefield, MBA 
  Sent: Wednesday, November 04, 2015 4:38 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

  1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
  2. When I restarted, the space available increased by a whopping 100GB.



  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 4:26 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

  Ensure you have the backup if have any critical data !



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
  Sent: Thursday, November 05, 2015 03:40
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  So like I can just create a new folder in the home directory like:
  home/hdfs/data
  and then set dfs.datanode.data.dir to:
  /hadoop/hdfs/data,home/hdfs/data

  Restart the node and that should do it correct?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 3:59 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Hi Bob,



  Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

  And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: P lva [ruvikal@gmail.com]
  Sent: Thursday, November 05, 2015 03:11
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  What does your dfs.datanode.data.dir point to ?



  On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

          Filesystem Size Used Avail Use% Mounted on 
          /dev/mapper/centos-root 50G 12G 39G 23% / 
          devtmpfs 16G 0 16G 0% /dev 
          tmpfs 16G 0 16G 0% /dev/shm 
          tmpfs 16G 1.4G 15G 9% /run 
          tmpfs 16G 0 16G 0% /sys/fs/cgroup 
          /dev/sda2 494M 123M 372M 25% /boot 
          /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


    That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

    Adaryl "Bob" Wakefield, MBA
    Principal
    Mass Street Analytics, LLC
    913.938.6685
    www.linkedin.com/in/bobwakefieldmba
    Twitter: @BobLovesData

    From: Chris Nauroth 
    Sent: Wednesday, November 04, 2015 12:16 PM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 11:16 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: Re: hadoop not using whole disk for HDFS


    Yeah. It has the current value of 1073741824 which is like 1.07 gig.

    B.
    From: Chris Nauroth 
    Sent: Tuesday, November 03, 2015 11:57 AM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    Hi Bob,

    Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>0</value>
      <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
      </description>
    </property>

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 10:51 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: hadoop not using whole disk for HDFS


    I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
    B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

dfs.datanode.data.dir  = /hadoop/hdfs/data,/hdfs/data

Data node 1:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


data node 2:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 24G 27G 48% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 24K 16G 1% /dev/shm 
      tmpfs 16G 97M 16G 1% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 124M 370M 26% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: iain wright 
Sent: Thursday, November 05, 2015 7:56 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Please post:  
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 

Iain Wright



This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.


On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

  Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Adaryl "Bob" Wakefield, MBA 
  Sent: Wednesday, November 04, 2015 4:38 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

  1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
  2. When I restarted, the space available increased by a whopping 100GB.



  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 4:26 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

  Ensure you have the backup if have any critical data !



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
  Sent: Thursday, November 05, 2015 03:40
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  So like I can just create a new folder in the home directory like:
  home/hdfs/data
  and then set dfs.datanode.data.dir to:
  /hadoop/hdfs/data,home/hdfs/data

  Restart the node and that should do it correct?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 3:59 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Hi Bob,



  Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

  And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: P lva [ruvikal@gmail.com]
  Sent: Thursday, November 05, 2015 03:11
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  What does your dfs.datanode.data.dir point to ?



  On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

          Filesystem Size Used Avail Use% Mounted on 
          /dev/mapper/centos-root 50G 12G 39G 23% / 
          devtmpfs 16G 0 16G 0% /dev 
          tmpfs 16G 0 16G 0% /dev/shm 
          tmpfs 16G 1.4G 15G 9% /run 
          tmpfs 16G 0 16G 0% /sys/fs/cgroup 
          /dev/sda2 494M 123M 372M 25% /boot 
          /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


    That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

    Adaryl "Bob" Wakefield, MBA
    Principal
    Mass Street Analytics, LLC
    913.938.6685
    www.linkedin.com/in/bobwakefieldmba
    Twitter: @BobLovesData

    From: Chris Nauroth 
    Sent: Wednesday, November 04, 2015 12:16 PM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 11:16 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: Re: hadoop not using whole disk for HDFS


    Yeah. It has the current value of 1073741824 which is like 1.07 gig.

    B.
    From: Chris Nauroth 
    Sent: Tuesday, November 03, 2015 11:57 AM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    Hi Bob,

    Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>0</value>
      <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
      </description>
    </property>

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 10:51 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: hadoop not using whole disk for HDFS


    I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
    B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

dfs.datanode.data.dir  = /hadoop/hdfs/data,/hdfs/data

Data node 1:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


data node 2:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 24G 27G 48% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 24K 16G 1% /dev/shm 
      tmpfs 16G 97M 16G 1% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 124M 370M 26% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: iain wright 
Sent: Thursday, November 05, 2015 7:56 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Please post:  
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 

Iain Wright



This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.


On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

  Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Adaryl "Bob" Wakefield, MBA 
  Sent: Wednesday, November 04, 2015 4:38 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

  1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
  2. When I restarted, the space available increased by a whopping 100GB.



  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 4:26 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

  Ensure you have the backup if have any critical data !



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
  Sent: Thursday, November 05, 2015 03:40
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  So like I can just create a new folder in the home directory like:
  home/hdfs/data
  and then set dfs.datanode.data.dir to:
  /hadoop/hdfs/data,home/hdfs/data

  Restart the node and that should do it correct?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 3:59 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Hi Bob,



  Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

  And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: P lva [ruvikal@gmail.com]
  Sent: Thursday, November 05, 2015 03:11
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  What does your dfs.datanode.data.dir point to ?



  On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

          Filesystem Size Used Avail Use% Mounted on 
          /dev/mapper/centos-root 50G 12G 39G 23% / 
          devtmpfs 16G 0 16G 0% /dev 
          tmpfs 16G 0 16G 0% /dev/shm 
          tmpfs 16G 1.4G 15G 9% /run 
          tmpfs 16G 0 16G 0% /sys/fs/cgroup 
          /dev/sda2 494M 123M 372M 25% /boot 
          /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


    That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

    Adaryl "Bob" Wakefield, MBA
    Principal
    Mass Street Analytics, LLC
    913.938.6685
    www.linkedin.com/in/bobwakefieldmba
    Twitter: @BobLovesData

    From: Chris Nauroth 
    Sent: Wednesday, November 04, 2015 12:16 PM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 11:16 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: Re: hadoop not using whole disk for HDFS


    Yeah. It has the current value of 1073741824 which is like 1.07 gig.

    B.
    From: Chris Nauroth 
    Sent: Tuesday, November 03, 2015 11:57 AM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    Hi Bob,

    Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>0</value>
      <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
      </description>
    </property>

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 10:51 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: hadoop not using whole disk for HDFS


    I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
    B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

dfs.datanode.data.dir  = /hadoop/hdfs/data,/hdfs/data

Data node 1:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


data node 2:
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 24G 27G 48% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 24K 16G 1% /dev/shm 
      tmpfs 16G 97M 16G 1% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 124M 370M 26% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: iain wright 
Sent: Thursday, November 05, 2015 7:56 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Please post:  
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 

Iain Wright



This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.


On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

  Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Adaryl "Bob" Wakefield, MBA 
  Sent: Wednesday, November 04, 2015 4:38 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

  1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
  2. When I restarted, the space available increased by a whopping 100GB.



  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 4:26 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

  Ensure you have the backup if have any critical data !



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
  Sent: Thursday, November 05, 2015 03:40
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  So like I can just create a new folder in the home directory like:
  home/hdfs/data
  and then set dfs.datanode.data.dir to:
  /hadoop/hdfs/data,home/hdfs/data

  Restart the node and that should do it correct?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Naganarasimha G R (Naga) 
  Sent: Wednesday, November 04, 2015 3:59 PM
  To: user@hadoop.apache.org 
  Subject: RE: hadoop not using whole disk for HDFS

  Hi Bob,



  Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

  And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



  Regards,

  + Naga


------------------------------------------------------------------------------

  From: P lva [ruvikal@gmail.com]
  Sent: Thursday, November 05, 2015 03:11
  To: user@hadoop.apache.org
  Subject: Re: hadoop not using whole disk for HDFS


  What does your dfs.datanode.data.dir point to ?



  On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

          Filesystem Size Used Avail Use% Mounted on 
          /dev/mapper/centos-root 50G 12G 39G 23% / 
          devtmpfs 16G 0 16G 0% /dev 
          tmpfs 16G 0 16G 0% /dev/shm 
          tmpfs 16G 1.4G 15G 9% /run 
          tmpfs 16G 0 16G 0% /sys/fs/cgroup 
          /dev/sda2 494M 123M 372M 25% /boot 
          /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


    That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

    Adaryl "Bob" Wakefield, MBA
    Principal
    Mass Street Analytics, LLC
    913.938.6685
    www.linkedin.com/in/bobwakefieldmba
    Twitter: @BobLovesData

    From: Chris Nauroth 
    Sent: Wednesday, November 04, 2015 12:16 PM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 11:16 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: Re: hadoop not using whole disk for HDFS


    Yeah. It has the current value of 1073741824 which is like 1.07 gig.

    B.
    From: Chris Nauroth 
    Sent: Tuesday, November 03, 2015 11:57 AM
    To: user@hadoop.apache.org 
    Subject: Re: hadoop not using whole disk for HDFS

    Hi Bob,

    Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>0</value>
      <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
      </description>
    </property>

    --Chris Nauroth

    From: MBA <ad...@hotmail.com>
    Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Date: Tuesday, November 3, 2015 at 10:51 AM
    To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
    Subject: hadoop not using whole disk for HDFS


    I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
    B.

Re: hadoop not using whole disk for HDFS

Posted by iain wright <ia...@gmail.com>.

Please post:
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So when you say remount, what exactly am I remounting? /dev/mapper/centos-home?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:04 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by Namikaze Minato <ll...@gmail.com>.

I hope you understand that you sent 5 emails to several hundred (thousand?)
people in the world in 15 minutes... Please think before hitting this
"send" button.

In Unix (AND windows) you can mount a drive into a folder. This means just
that the disk is accessible from that folder, it does not increase the
capacity of / to mount a 2 TB drive in /home. Nor does it use any space on
/ to do so.
Just think that / is one drive, which contains everything EXCEPT /home and
is for example 50GB big and /home is another drive which is 2TB big.

What you need is to make your hadoop understand that it should use /home
(to be precise a folder in /home and not the complete partition) as hdfs
storage space. Now I will let the other people in the thread disscuss with
you about the technicalities of setting that parameter in the right config
file, as I don't have the knowledge about this specific matter.

Regards,
LLoyd

On 8 November 2015 at 00:00, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> No it’s flat out saying that that config cannot be set with anything
> starting with /home.
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Thursday, November 05, 2015 10:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> I am suspecting Ambari would not be allowing to create a folder directly
> under */home*, might be it will allow */home/<user_name>/hdfs*, since
> directories under /home is expected to be users home dir.
>
> Regards,
> + Naga
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 09:34
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Thanks Brahma, dint realize he might have configured both directories and
> i was assuming bob has configured single new directory "/hdfs/data".
> So virtually its showing additional space,
> *manually try to add a data dir in /home, for your usecase, and restart
> datanodes.*
> Not sure about the impacs in Ambari but worth a try! , more permanent
> solution would be better remount
> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-home 2.7T
> 33M 2.7T 1% /home
> ------------------------------
> *From:* Brahma Reddy Battula [brahmareddy.battula@huawei.com]
> *Sent:* Friday, November 06, 2015 08:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> For each configured *dfs.datanode.data.dir* , HDFS thinks its in separate
> partiotion and counts the capacity separately. So when another dir is added
> /hdfs/data, HDFS thinks new partition is added, So it increased the
> capacity 50GB per node. i.e. 100GB for 2 Nodes.
>
> Not allowing /home directory to configure for data.dir might be ambari's
> constraint, instead you can *manually try to add a data dir* in /home,
> for your usecase, and restart datanodes.
>
>
>
> Thanks & Regards
>
>  Brahma Reddy Battula
>
>
>
>
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 7:20 AM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
>
>
> *1. I wasn’t able to set the config to /home/hdfs/data. I got an error
> that told me I’m not allowed to set that config to the /home directory. So
> I made it /hdfs/data.*
>
> *Naga : *I am not sure about the HDP Distro but if you make it point to */hdfs/data,
> *still it will be pointing to the root mount itself i.e.
>
> *    /dev/mapper/centos-root* *50G* *12G* *39G* *23%* */*
>
> Other Alternative is to mount the drive to some other folder other than
> /home and then try.
>
>
> *2. When I restarted, the space available increased by a whopping 100GB.*
>
> *Naga : *I am particularly not sure how this happened may be you can
> again recheck if you enter the command *"df -h <path of the NM data dir
> configured>" *you will find out how much disk space is available on the
> related mount for which the path is configured.
>
>
>
> Regards,
>
> + Naga
>
>
>
>
>
>
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Friday, November 06, 2015 06:54
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by Namikaze Minato <ll...@gmail.com>.

I hope you understand that you sent 5 emails to several hundred (thousand?)
people in the world in 15 minutes... Please think before hitting this
"send" button.

In Unix (AND windows) you can mount a drive into a folder. This means just
that the disk is accessible from that folder, it does not increase the
capacity of / to mount a 2 TB drive in /home. Nor does it use any space on
/ to do so.
Just think that / is one drive, which contains everything EXCEPT /home and
is for example 50GB big and /home is another drive which is 2TB big.

What you need is to make your hadoop understand that it should use /home
(to be precise a folder in /home and not the complete partition) as hdfs
storage space. Now I will let the other people in the thread disscuss with
you about the technicalities of setting that parameter in the right config
file, as I don't have the knowledge about this specific matter.

Regards,
LLoyd

On 8 November 2015 at 00:00, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> No it’s flat out saying that that config cannot be set with anything
> starting with /home.
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Thursday, November 05, 2015 10:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> I am suspecting Ambari would not be allowing to create a folder directly
> under */home*, might be it will allow */home/<user_name>/hdfs*, since
> directories under /home is expected to be users home dir.
>
> Regards,
> + Naga
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 09:34
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Thanks Brahma, dint realize he might have configured both directories and
> i was assuming bob has configured single new directory "/hdfs/data".
> So virtually its showing additional space,
> *manually try to add a data dir in /home, for your usecase, and restart
> datanodes.*
> Not sure about the impacs in Ambari but worth a try! , more permanent
> solution would be better remount
> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-home 2.7T
> 33M 2.7T 1% /home
> ------------------------------
> *From:* Brahma Reddy Battula [brahmareddy.battula@huawei.com]
> *Sent:* Friday, November 06, 2015 08:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> For each configured *dfs.datanode.data.dir* , HDFS thinks its in separate
> partiotion and counts the capacity separately. So when another dir is added
> /hdfs/data, HDFS thinks new partition is added, So it increased the
> capacity 50GB per node. i.e. 100GB for 2 Nodes.
>
> Not allowing /home directory to configure for data.dir might be ambari's
> constraint, instead you can *manually try to add a data dir* in /home,
> for your usecase, and restart datanodes.
>
>
>
> Thanks & Regards
>
>  Brahma Reddy Battula
>
>
>
>
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 7:20 AM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
>
>
> *1. I wasn’t able to set the config to /home/hdfs/data. I got an error
> that told me I’m not allowed to set that config to the /home directory. So
> I made it /hdfs/data.*
>
> *Naga : *I am not sure about the HDP Distro but if you make it point to */hdfs/data,
> *still it will be pointing to the root mount itself i.e.
>
> *    /dev/mapper/centos-root* *50G* *12G* *39G* *23%* */*
>
> Other Alternative is to mount the drive to some other folder other than
> /home and then try.
>
>
> *2. When I restarted, the space available increased by a whopping 100GB.*
>
> *Naga : *I am particularly not sure how this happened may be you can
> again recheck if you enter the command *"df -h <path of the NM data dir
> configured>" *you will find out how much disk space is available on the
> related mount for which the path is configured.
>
>
>
> Regards,
>
> + Naga
>
>
>
>
>
>
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Friday, November 06, 2015 06:54
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by Namikaze Minato <ll...@gmail.com>.

I hope you understand that you sent 5 emails to several hundred (thousand?)
people in the world in 15 minutes... Please think before hitting this
"send" button.

In Unix (AND windows) you can mount a drive into a folder. This means just
that the disk is accessible from that folder, it does not increase the
capacity of / to mount a 2 TB drive in /home. Nor does it use any space on
/ to do so.
Just think that / is one drive, which contains everything EXCEPT /home and
is for example 50GB big and /home is another drive which is 2TB big.

What you need is to make your hadoop understand that it should use /home
(to be precise a folder in /home and not the complete partition) as hdfs
storage space. Now I will let the other people in the thread disscuss with
you about the technicalities of setting that parameter in the right config
file, as I don't have the knowledge about this specific matter.

Regards,
LLoyd

On 8 November 2015 at 00:00, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> No it’s flat out saying that that config cannot be set with anything
> starting with /home.
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Thursday, November 05, 2015 10:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> I am suspecting Ambari would not be allowing to create a folder directly
> under */home*, might be it will allow */home/<user_name>/hdfs*, since
> directories under /home is expected to be users home dir.
>
> Regards,
> + Naga
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 09:34
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Thanks Brahma, dint realize he might have configured both directories and
> i was assuming bob has configured single new directory "/hdfs/data".
> So virtually its showing additional space,
> *manually try to add a data dir in /home, for your usecase, and restart
> datanodes.*
> Not sure about the impacs in Ambari but worth a try! , more permanent
> solution would be better remount
> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-home 2.7T
> 33M 2.7T 1% /home
> ------------------------------
> *From:* Brahma Reddy Battula [brahmareddy.battula@huawei.com]
> *Sent:* Friday, November 06, 2015 08:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> For each configured *dfs.datanode.data.dir* , HDFS thinks its in separate
> partiotion and counts the capacity separately. So when another dir is added
> /hdfs/data, HDFS thinks new partition is added, So it increased the
> capacity 50GB per node. i.e. 100GB for 2 Nodes.
>
> Not allowing /home directory to configure for data.dir might be ambari's
> constraint, instead you can *manually try to add a data dir* in /home,
> for your usecase, and restart datanodes.
>
>
>
> Thanks & Regards
>
>  Brahma Reddy Battula
>
>
>
>
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 7:20 AM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
>
>
> *1. I wasn’t able to set the config to /home/hdfs/data. I got an error
> that told me I’m not allowed to set that config to the /home directory. So
> I made it /hdfs/data.*
>
> *Naga : *I am not sure about the HDP Distro but if you make it point to */hdfs/data,
> *still it will be pointing to the root mount itself i.e.
>
> *    /dev/mapper/centos-root* *50G* *12G* *39G* *23%* */*
>
> Other Alternative is to mount the drive to some other folder other than
> /home and then try.
>
>
> *2. When I restarted, the space available increased by a whopping 100GB.*
>
> *Naga : *I am particularly not sure how this happened may be you can
> again recheck if you enter the command *"df -h <path of the NM data dir
> configured>" *you will find out how much disk space is available on the
> related mount for which the path is configured.
>
>
>
> Regards,
>
> + Naga
>
>
>
>
>
>
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Friday, November 06, 2015 06:54
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by Namikaze Minato <ll...@gmail.com>.

I hope you understand that you sent 5 emails to several hundred (thousand?)
people in the world in 15 minutes... Please think before hitting this
"send" button.

In Unix (AND windows) you can mount a drive into a folder. This means just
that the disk is accessible from that folder, it does not increase the
capacity of / to mount a 2 TB drive in /home. Nor does it use any space on
/ to do so.
Just think that / is one drive, which contains everything EXCEPT /home and
is for example 50GB big and /home is another drive which is 2TB big.

What you need is to make your hadoop understand that it should use /home
(to be precise a folder in /home and not the complete partition) as hdfs
storage space. Now I will let the other people in the thread disscuss with
you about the technicalities of setting that parameter in the right config
file, as I don't have the knowledge about this specific matter.

Regards,
LLoyd

On 8 November 2015 at 00:00, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> No it’s flat out saying that that config cannot be set with anything
> starting with /home.
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Thursday, November 05, 2015 10:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> I am suspecting Ambari would not be allowing to create a folder directly
> under */home*, might be it will allow */home/<user_name>/hdfs*, since
> directories under /home is expected to be users home dir.
>
> Regards,
> + Naga
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 09:34
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Thanks Brahma, dint realize he might have configured both directories and
> i was assuming bob has configured single new directory "/hdfs/data".
> So virtually its showing additional space,
> *manually try to add a data dir in /home, for your usecase, and restart
> datanodes.*
> Not sure about the impacs in Ambari but worth a try! , more permanent
> solution would be better remount
> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-home 2.7T
> 33M 2.7T 1% /home
> ------------------------------
> *From:* Brahma Reddy Battula [brahmareddy.battula@huawei.com]
> *Sent:* Friday, November 06, 2015 08:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> For each configured *dfs.datanode.data.dir* , HDFS thinks its in separate
> partiotion and counts the capacity separately. So when another dir is added
> /hdfs/data, HDFS thinks new partition is added, So it increased the
> capacity 50GB per node. i.e. 100GB for 2 Nodes.
>
> Not allowing /home directory to configure for data.dir might be ambari's
> constraint, instead you can *manually try to add a data dir* in /home,
> for your usecase, and restart datanodes.
>
>
>
> Thanks & Regards
>
>  Brahma Reddy Battula
>
>
>
>
> ------------------------------
> *From:* Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
> *Sent:* Friday, November 06, 2015 7:20 AM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
>
>
> *1. I wasn’t able to set the config to /home/hdfs/data. I got an error
> that told me I’m not allowed to set that config to the /home directory. So
> I made it /hdfs/data.*
>
> *Naga : *I am not sure about the HDP Distro but if you make it point to */hdfs/data,
> *still it will be pointing to the root mount itself i.e.
>
> *    /dev/mapper/centos-root* *50G* *12G* *39G* *23%* */*
>
> Other Alternative is to mount the drive to some other folder other than
> /home and then try.
>
>
> *2. When I restarted, the space available increased by a whopping 100GB.*
>
> *Naga : *I am particularly not sure how this happened may be you can
> again recheck if you enter the command *"df -h <path of the NM data dir
> configured>" *you will find out how much disk space is available on the
> related mount for which the path is configured.
>
>
>
> Regards,
>
> + Naga
>
>
>
>
>
>
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Friday, November 06, 2015 06:54
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

No it’s flat out saying that that config cannot be set with anything starting with /home.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:58 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob, 

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga


--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

No it’s flat out saying that that config cannot be set with anything starting with /home.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:58 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob, 

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga


--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

No it’s flat out saying that that config cannot be set with anything starting with /home.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:58 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob, 

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga


--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

No it’s flat out saying that that config cannot be set with anything starting with /home.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:58 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob, 

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga


--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga
________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga
________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga
________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,

I am suspecting Ambari would not be allowing to create a folder directly under /home, might be it will allow /home/<user_name>/hdfs, since directories under /home is expected to be users home dir.

Regards,
+ Naga
________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 09:34
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So when you say remount, what exactly am I remounting? /dev/mapper/centos-home?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:04 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So when you say remount, what exactly am I remounting? /dev/mapper/centos-home?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:04 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So when you say remount, what exactly am I remounting? /dev/mapper/centos-home?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 10:04 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".   
So virtually its showing additional space, 
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount 
      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


--------------------------------------------------------------------------------

From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS



For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

By manually you mean actually going in with nano and editing the config file? I could do that but if Ambari won’t let you do it through the interface, isn’t it possible that trying to add the directory in home might break something?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Brahma Reddy Battula 
Sent: Thursday, November 05, 2015 8:49 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS


For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

By manually you mean actually going in with nano and editing the config file? I could do that but if Ambari won’t let you do it through the interface, isn’t it possible that trying to add the directory in home might break something?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Brahma Reddy Battula 
Sent: Thursday, November 05, 2015 8:49 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS


For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Thanks Brahma, dint realize he might have configured both directories and i was assuming bob has configured single new directory "/hdfs/data".
So virtually its showing additional space,
manually try to add a data dir in /home, for your usecase, and restart datanodes.
Not sure about the impacs in Ambari but worth a try! , more permanent solution would be better remount
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home
________________________________
From: Brahma Reddy Battula [brahmareddy.battula@huawei.com]
Sent: Friday, November 06, 2015 08:19
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.

Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.

2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.

Regards,

+ Naga

________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !

Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,

Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.

Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?

On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

By manually you mean actually going in with nano and editing the config file? I could do that but if Ambari won’t let you do it through the interface, isn’t it possible that trying to add the directory in home might break something?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Brahma Reddy Battula 
Sent: Thursday, November 05, 2015 8:49 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS


For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

By manually you mean actually going in with nano and editing the config file? I could do that but if Ambari won’t let you do it through the interface, isn’t it possible that trying to add the directory in home might break something?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Brahma Reddy Battula 
Sent: Thursday, November 05, 2015 8:49 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS


For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.





Thanks & Regards

 Brahma Reddy Battula







--------------------------------------------------------------------------------

From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by Brahma Reddy Battula <br...@huawei.com>.

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.




Thanks & Regards

 Brahma Reddy Battula




________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I think it might help if I had a better understanding of what I’m looking at:
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


So /dev/mapper/centos-home is the file system and /home is where it is mounted. I’m not sure I even know what that means. Are you saying that /hdfs/data even though it’s in root that it’s still somehow pointing to /home? So confused. It’s the part amount mounting a drive to another folder..on the same disk. Is it kind of like how on Windows you can have more than one “drive” on a disk?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 7:50 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I think it might help if I had a better understanding of what I’m looking at:
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


So /dev/mapper/centos-home is the file system and /home is where it is mounted. I’m not sure I even know what that means. Are you saying that /hdfs/data even though it’s in root that it’s still somehow pointing to /home? So confused. It’s the part amount mounting a drive to another folder..on the same disk. Is it kind of like how on Windows you can have more than one “drive” on a disk?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 7:50 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I think it might help if I had a better understanding of what I’m looking at:
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


So /dev/mapper/centos-home is the file system and /home is where it is mounted. I’m not sure I even know what that means. Are you saying that /hdfs/data even though it’s in root that it’s still somehow pointing to /home? So confused. It’s the part amount mounting a drive to another folder..on the same disk. Is it kind of like how on Windows you can have more than one “drive” on a disk?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 7:50 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by Brahma Reddy Battula <br...@huawei.com>.

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.




Thanks & Regards

 Brahma Reddy Battula




________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by Brahma Reddy Battula <br...@huawei.com>.

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.




Thanks & Regards

 Brahma Reddy Battula




________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by Brahma Reddy Battula <br...@huawei.com>.

For each configured dfs.datanode.data.dir , HDFS thinks its in separate partiotion and counts the capacity separately. So when another dir is added /hdfs/data, HDFS thinks new partition is added, So it increased the capacity 50GB per node. i.e. 100GB for 2 Nodes.

Not allowing /home directory to configure for data.dir might be ambari's constraint, instead you can manually try to add a data dir in /home, for your usecase, and restart datanodes.




Thanks & Regards

 Brahma Reddy Battula




________________________________
From: Naganarasimha G R (Naga) [garlanaganarasimha@huawei.com]
Sent: Friday, November 06, 2015 7:20 AM
To: user@hadoop.apache.org
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I think it might help if I had a better understanding of what I’m looking at:
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


So /dev/mapper/centos-home is the file system and /home is where it is mounted. I’m not sure I even know what that means. Are you saying that /hdfs/data even though it’s in root that it’s still somehow pointing to /home? So confused. It’s the part amount mounting a drive to another folder..on the same disk. Is it kind of like how on Windows you can have more than one “drive” on a disk?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Thursday, November 05, 2015 7:50 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

          /dev/mapper/centos-root 50G 12G 39G 23% / 


Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.
Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>" you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga








--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by iain wright <ia...@gmail.com>.

Please post:
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by iain wright <ia...@gmail.com>.

Please post:
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

Re: hadoop not using whole disk for HDFS

Posted by iain wright <ia...@gmail.com>.

Please post:
- output of df -h from every datanode in your cluster
- what dfs.datanode.data.dir is currently set too

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Thu, Nov 5, 2015 at 5:24 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Is there a maximum amount of disk space that HDFS will use? Is 100GB that
> max? When we’re supposed to be dealing with “big data” why is the amount of
> data to be held on any one box such a small number when you’ve got
> terabytes available?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>
> *Sent:* Wednesday, November 04, 2015 4:38 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> This is an experimental cluster and there isn’t anything I can’t lose. I
> ran into some issues. I’m running the Hortonworks distro and am managing
> things through Ambari.
>
> 1. I wasn’t able to set the config to /home/hdfs/data. I got an error that
> told me I’m not allowed to set that config to the /home directory. So I
> made it /hdfs/data.
> 2. When I restarted, the space available increased by a whopping 100GB.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 4:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Better would be to stop the daemons and copy the data from */hadoop/hdfs/data
> *to */home/hdfs/data *, reconfigure *dfs.datanode.data.dir* to */home/hdfs/data
> *and then start the daemons. If the data is comparitively less !
>
> Ensure you have the backup if have any critical data !
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
> *Sent:* Thursday, November 05, 2015 03:40
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> So like I can just create a new folder in the home directory like:
> home/hdfs/data
> and then set dfs.datanode.data.dir to:
> /hadoop/hdfs/data,home/hdfs/data
>
> Restart the node and that should do it correct?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Naganarasimha G R (Naga) <ga...@huawei.com>
> *Sent:* Wednesday, November 04, 2015 3:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: hadoop not using whole disk for HDFS
>
>
> Hi Bob,
>
>
>
> Seems like you have configured to disk dir to be other than an folder in*
> /home,* if so try creating another folder and add to
> *"dfs.datanode.data.dir"* seperated by comma instead of trying to reset
> the default.
>
> And its also advised not to use the root partition "/" to be configured
> for HDFS data dir, if the Dir usage hits the maximum then OS might fail to
> function properly.
>
>
>
> Regards,
>
> + Naga
> ------------------------------
> *From:* P lva [ruvikal@gmail.com]
> *Sent:* Thursday, November 05, 2015 03:11
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> What does your dfs.datanode.data.dir point to ?
>
>
> On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G
>> 12G 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm
>> tmpfs 16G 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2
>> 494M 123M 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>>
>> That’s from one datanode. The second one is nearly identical. I
>> discovered that 50GB is actually a default. That seems really weird. Disk
>> space is cheap. Why would you not just use most of the disk and why is it
>> so hard to reset the default?
>>
>> Adaryl "Bob" Wakefield, MBA
>> Principal
>> Mass Street Analytics, LLC
>> 913.938.6685
>> www.linkedin.com/in/bobwakefieldmba
>> Twitter: @BobLovesData
>>
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Wednesday, November 04, 2015 12:16 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> How are those drives partitioned?  Is it possible that the directories
>> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
>> partitions that are sized to only 100 GB?  Running commands like df would
>> be a good way to check this at the OS level, independently of Hadoop.
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 11:16 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: hadoop not using whole disk for HDFS
>>
>> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>>
>> B.
>> *From:* Chris Nauroth <cn...@hortonworks.com>
>> *Sent:* Tuesday, November 03, 2015 11:57 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: hadoop not using whole disk for HDFS
>>
>> Hi Bob,
>>
>> Does the hdfs-site.xml configuration file contain the property
>> dfs.datanode.du.reserved?  If this is defined, then the DataNode
>> intentionally will not use this space for storage of replicas.
>>
>> <property>
>>   <name>dfs.datanode.du.reserved</name>
>>   <value>0</value>
>>   <description>Reserved space in bytes per volume. Always leave this much
>> space free for non dfs use.
>>   </description>
>> </property>
>>
>> --Chris Nauroth
>>
>> From: MBA <ad...@hotmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Tuesday, November 3, 2015 at 10:51 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: hadoop not using whole disk for HDFS
>>
>> I’ve got the Hortonworks distro running on a three node cluster. For some
>> reason the disk available for HDFS is MUCH less than the total disk space.
>> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
>> used for HDFS. Is it possible that I have a setting wrong somewhere?
>>
>> B.
>>
>
>

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.

Naga : I am not sure about the HDP Distro but if you make it point to /hdfs/data, still it will be pointing to the root mount itself i.e.

    /dev/mapper/centos-root     50G     12G     39G     23%     /

Other Alternative is to mount the drive to some other folder other than /home and then try.



2. When I restarted, the space available increased by a whopping 100GB.

Naga : I am particularly not sure how this happened may be you can again recheck if you enter the command "df -h <path of the NM data dir configured>"  you will find out how much disk space is available on the related mount for which the path is configured.



Regards,

+ Naga







________________________________

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Friday, November 06, 2015 06:54
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA<ma...@hotmail.com>
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari.

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Adaryl "Bob" Wakefield, MBA 
Sent: Wednesday, November 04, 2015 4:38 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

This is an experimental cluster and there isn’t anything I can’t lose. I ran into some issues. I’m running the Hortonworks distro and am managing things through Ambari. 

1. I wasn’t able to set the config to /home/hdfs/data. I got an error that told me I’m not allowed to set that config to the /home directory. So I made it /hdfs/data.
2. When I restarted, the space available increased by a whopping 100GB.



Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 4:26 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga


--------------------------------------------------------------------------------

From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Better would be to stop the daemons and copy the data from /hadoop/hdfs/data to /home/hdfs/data , reconfigure dfs.datanode.data.dir to /home/hdfs/data and then start the daemons. If the data is comparitively less !

Ensure you have the backup if have any critical data !



Regards,

+ Naga

________________________________
From: Adaryl "Bob" Wakefield, MBA [adaryl.wakefield@hotmail.com]
Sent: Thursday, November 05, 2015 03:40
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga)<ma...@huawei.com>
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: hadoop not using whole disk for HDFS


Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________
From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

So like I can just create a new folder in the home directory like:
home/hdfs/data
and then set dfs.datanode.data.dir to:
/hadoop/hdfs/data,home/hdfs/data

Restart the node and that should do it correct?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Naganarasimha G R (Naga) 
Sent: Wednesday, November 04, 2015 3:59 PM
To: user@hadoop.apache.org 
Subject: RE: hadoop not using whole disk for HDFS

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga


--------------------------------------------------------------------------------

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS


What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

/hadoop/hdfs/data

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: P lva 
Sent: Wednesday, November 04, 2015 3:41 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

/hadoop/hdfs/data

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: P lva 
Sent: Wednesday, November 04, 2015 3:41 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

/hadoop/hdfs/data

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: P lva 
Sent: Wednesday, November 04, 2015 3:41 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

/hadoop/hdfs/data

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: P lva 
Sent: Wednesday, November 04, 2015 3:41 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?



On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com> wrote:

        Filesystem Size Used Avail Use% Mounted on 
        /dev/mapper/centos-root 50G 12G 39G 23% / 
        devtmpfs 16G 0 16G 0% /dev 
        tmpfs 16G 0 16G 0% /dev/shm 
        tmpfs 16G 1.4G 15G 9% /run 
        tmpfs 16G 0 16G 0% /sys/fs/cgroup 
        /dev/sda2 494M 123M 372M 25% /boot 
        /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


  That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

  Adaryl "Bob" Wakefield, MBA
  Principal
  Mass Street Analytics, LLC
  913.938.6685
  www.linkedin.com/in/bobwakefieldmba
  Twitter: @BobLovesData

  From: Chris Nauroth 
  Sent: Wednesday, November 04, 2015 12:16 PM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 11:16 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: Re: hadoop not using whole disk for HDFS


  Yeah. It has the current value of 1073741824 which is like 1.07 gig.

  B.
  From: Chris Nauroth 
  Sent: Tuesday, November 03, 2015 11:57 AM
  To: user@hadoop.apache.org 
  Subject: Re: hadoop not using whole disk for HDFS

  Hi Bob,

  Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>0</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>

  --Chris Nauroth

  From: MBA <ad...@hotmail.com>
  Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Date: Tuesday, November 3, 2015 at 10:51 AM
  To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
  Subject: hadoop not using whole disk for HDFS


  I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
  B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

RE: hadoop not using whole disk for HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Bob,



Seems like you have configured to disk dir to be other than an folder in /home, if so try creating another folder and add to "dfs.datanode.data.dir" seperated by comma instead of trying to reset the default.

And its also advised not to use the root partition "/" to be configured for HDFS data dir, if the Dir usage hits the maximum then OS might fail to function properly.



Regards,

+ Naga

________________________________

From: P lva [ruvikal@gmail.com]
Sent: Thursday, November 05, 2015 03:11
To: user@hadoop.apache.org
Subject: Re: hadoop not using whole disk for HDFS

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <ad...@hotmail.com>> wrote:
Filesystem      Size    Used    Avail   Use%    Mounted on
/dev/mapper/centos-root 50G     12G     39G     23%     /
devtmpfs        16G     0       16G     0%      /dev
tmpfs   16G     0       16G     0%      /dev/shm
tmpfs   16G     1.4G    15G     9%      /run
tmpfs   16G     0       16G     0%      /sys/fs/cgroup
/dev/sda2       494M    123M    372M    25%     /boot
/dev/mapper/centos-home 2.7T    33M     2.7T    1%      /home

That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Chris Nauroth<ma...@hortonworks.com>
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by P lva <ru...@gmail.com>.

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 12G
> 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G
> 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 494M 123M
> 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>
> That’s from one datanode. The second one is nearly identical. I discovered
> that 50GB is actually a default. That seems really weird. Disk space is
> cheap. Why would you not just use most of the disk and why is it so hard to
> reset the default?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Wednesday, November 04, 2015 12:16 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> How are those drives partitioned?  Is it possible that the directories
> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
> partitions that are sized to only 100 GB?  Running commands like df would
> be a good way to check this at the OS level, independently of Hadoop.
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 11:16 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: hadoop not using whole disk for HDFS
>
> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>
> B.
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Tuesday, November 03, 2015 11:57 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> Does the hdfs-site.xml configuration file contain the property
> dfs.datanode.du.reserved?  If this is defined, then the DataNode
> intentionally will not use this space for storage of replicas.
>
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <value>0</value>
>   <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>   </description>
> </property>
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 10:51 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: hadoop not using whole disk for HDFS
>
> I’ve got the Hortonworks distro running on a three node cluster. For some
> reason the disk available for HDFS is MUCH less than the total disk space.
> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
> used for HDFS. Is it possible that I have a setting wrong somewhere?
>
> B.
>

Re: hadoop not using whole disk for HDFS

Posted by P lva <ru...@gmail.com>.

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 12G
> 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G
> 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 494M 123M
> 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>
> That’s from one datanode. The second one is nearly identical. I discovered
> that 50GB is actually a default. That seems really weird. Disk space is
> cheap. Why would you not just use most of the disk and why is it so hard to
> reset the default?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Wednesday, November 04, 2015 12:16 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> How are those drives partitioned?  Is it possible that the directories
> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
> partitions that are sized to only 100 GB?  Running commands like df would
> be a good way to check this at the OS level, independently of Hadoop.
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 11:16 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: hadoop not using whole disk for HDFS
>
> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>
> B.
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Tuesday, November 03, 2015 11:57 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> Does the hdfs-site.xml configuration file contain the property
> dfs.datanode.du.reserved?  If this is defined, then the DataNode
> intentionally will not use this space for storage of replicas.
>
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <value>0</value>
>   <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>   </description>
> </property>
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 10:51 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: hadoop not using whole disk for HDFS
>
> I’ve got the Hortonworks distro running on a three node cluster. For some
> reason the disk available for HDFS is MUCH less than the total disk space.
> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
> used for HDFS. Is it possible that I have a setting wrong somewhere?
>
> B.
>

Re: hadoop not using whole disk for HDFS

Posted by P lva <ru...@gmail.com>.

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 12G
> 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G
> 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 494M 123M
> 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>
> That’s from one datanode. The second one is nearly identical. I discovered
> that 50GB is actually a default. That seems really weird. Disk space is
> cheap. Why would you not just use most of the disk and why is it so hard to
> reset the default?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Wednesday, November 04, 2015 12:16 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> How are those drives partitioned?  Is it possible that the directories
> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
> partitions that are sized to only 100 GB?  Running commands like df would
> be a good way to check this at the OS level, independently of Hadoop.
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 11:16 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: hadoop not using whole disk for HDFS
>
> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>
> B.
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Tuesday, November 03, 2015 11:57 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> Does the hdfs-site.xml configuration file contain the property
> dfs.datanode.du.reserved?  If this is defined, then the DataNode
> intentionally will not use this space for storage of replicas.
>
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <value>0</value>
>   <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>   </description>
> </property>
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 10:51 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: hadoop not using whole disk for HDFS
>
> I’ve got the Hortonworks distro running on a three node cluster. For some
> reason the disk available for HDFS is MUCH less than the total disk space.
> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
> used for HDFS. Is it possible that I have a setting wrong somewhere?
>
> B.
>

Re: hadoop not using whole disk for HDFS

Posted by P lva <ru...@gmail.com>.

What does your dfs.datanode.data.dir point to ?


On Wed, Nov 4, 2015 at 4:14 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 12G
> 39G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G
> 1.4G 15G 9% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 494M 123M
> 372M 25% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home
>
> That’s from one datanode. The second one is nearly identical. I discovered
> that 50GB is actually a default. That seems really weird. Disk space is
> cheap. Why would you not just use most of the disk and why is it so hard to
> reset the default?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Wednesday, November 04, 2015 12:16 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> How are those drives partitioned?  Is it possible that the directories
> pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on
> partitions that are sized to only 100 GB?  Running commands like df would
> be a good way to check this at the OS level, independently of Hadoop.
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 11:16 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: hadoop not using whole disk for HDFS
>
> Yeah. It has the current value of 1073741824 which is like 1.07 gig.
>
> B.
> *From:* Chris Nauroth <cn...@hortonworks.com>
> *Sent:* Tuesday, November 03, 2015 11:57 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: hadoop not using whole disk for HDFS
>
> Hi Bob,
>
> Does the hdfs-site.xml configuration file contain the property
> dfs.datanode.du.reserved?  If this is defined, then the DataNode
> intentionally will not use this space for storage of replicas.
>
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <value>0</value>
>   <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>   </description>
> </property>
>
> --Chris Nauroth
>
> From: MBA <ad...@hotmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Tuesday, November 3, 2015 at 10:51 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: hadoop not using whole disk for HDFS
>
> I’ve got the Hortonworks distro running on a three node cluster. For some
> reason the disk available for HDFS is MUCH less than the total disk space.
> Both of my data nodes have 3TB hard drives. Only 100GB of that is being
> used for HDFS. Is it possible that I have a setting wrong somewhere?
>
> B.
>

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Chris Nauroth 
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS


Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS


I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Chris Nauroth 
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS


Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS


I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Chris Nauroth 
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS


Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS


I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

      Filesystem Size Used Avail Use% Mounted on 
      /dev/mapper/centos-root 50G 12G 39G 23% / 
      devtmpfs 16G 0 16G 0% /dev 
      tmpfs 16G 0 16G 0% /dev/shm 
      tmpfs 16G 1.4G 15G 9% /run 
      tmpfs 16G 0 16G 0% /sys/fs/cgroup 
      /dev/sda2 494M 123M 372M 25% /boot 
      /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home 


That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Chris Nauroth 
Sent: Wednesday, November 04, 2015 12:16 PM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS


Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS


I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

How are those drives partitioned?  Is it possible that the directories pointed to by the dfs.datanode.data.dir property in hdfs-site.xml reside on partitions that are sized to only 100 GB?  Running commands like df would be a good way to check this at the OS level, independently of Hadoop.

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 11:16 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: hadoop not using whole disk for HDFS

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth<ma...@hortonworks.com>
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Yeah. It has the current value of 1073741824 which is like 1.07 gig.

B.
From: Chris Nauroth 
Sent: Tuesday, November 03, 2015 11:57 AM
To: user@hadoop.apache.org 
Subject: Re: hadoop not using whole disk for HDFS

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: hadoop not using whole disk for HDFS

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

Re: hadoop not using whole disk for HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Bob,

Does the hdfs-site.xml configuration file contain the property dfs.datanode.du.reserved?  If this is defined, then the DataNode intentionally will not use this space for storage of replicas.

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

--Chris Nauroth

From: MBA <ad...@hotmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, November 3, 2015 at 10:51 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: hadoop not using whole disk for HDFS

I've got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere?

B.

hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@apache.org>.

thats cool.

-Vinay

On Tue, Nov 3, 2015 at 9:34 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Thanks all...It was a cluster issue...Its working for me now....:)
> On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:
>
>> Hi Shashi,
>>
>>
>>
>>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default)
>> from any of the cluster machine’s Hadoop installation as mentioned in #1 in
>> Andreina’s reply below?
>> I hope, if cluster is running successfully with Kerberos enabled, it
>> should have a configuration “dfs.namenode.kerberos.principal"
>>
>>
>>
>>    Also you need to keep this directory ( yes, directory itself, not
>> files inside it) in class path of your client program.
>>
>>
>>
>> -Vinay
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* Monday, November 02, 2015 10:47 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Andreina,
>>
>>
>>
>> I used you java code and ran it using java command. On console I can see
>> message as Login Successful but while accessing HDFS I am getting below
>> error message:
>>
>>
>>
>> "Failed to specify server's kerberos principal name"
>>
>>
>>
>> Any suggestion for this?
>>
>>
>>
>> Thanks and Regards,
>>
>> Shashi
>>
>>
>>
>> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>>
>>
>>
>> Hi Shashi Vishwakarma ,
>>
>>
>>
>> You can follow below steps to perform HDFS operation using java code on a
>> secure cluster
>>
>>
>>
>> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
>> cluster
>>
>> 2.       Create a maven project with dependeny hadoop-client
>>
>>     <dependency>
>>
>>     <groupId>org.apache.hadoop</groupId>
>>
>>    <artifactId>hadoop-client</artifactId>
>>
>>    <version><version>-SNAPSHOT</version>
>>
>>    </dependency>
>>
>>
>>
>> 3.      Build the maven project, to resolve all the dependencies
>>
>> 4.      Add conf directory to classpath.
>>
>> 5.      Use below sample code to perform HDFS operation.
>>
>>
>>
>>             public class KerberosTest {
>>
>>
>>
>>                public static void main(String[] args) throws IOException {
>>
>>                  // This should be ideally default. now just for this
>> purpose overriding
>>
>>                  System.setProperty("java.security.krb5.conf",
>> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>>
>>
>>
>>                  // Login using keytab if have access to keytab. else
>>
>>                  UserGroupInformation.loginUserFromKeytab("hdfs @
>> HADOOP.COM",
>>
>>
>>          "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>>
>>
>>
>>                  String dest = "/test/userupload/file";
>>
>>                  String localFile = "pom.xml";
>>
>>
>>
>>                  Configuration conf = new HdfsConfiguration();
>>
>>                  FileSystem fs = FileSystem.get(conf);
>>
>>                  FSDataOutputStream out = fs.create(new Path(dest));
>>
>>                  FileInputStream fIn = new FileInputStream(localFile);
>>
>>                  IOUtils.copyBytes(fIn, out, 1024);
>>
>>               }
>>
>>
>>
>>             }
>>
>>          Note: Change the paths mentioned above accordingly
>>
>>
>>
>> Regards,
>>
>> Andreina J.
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* 02 November 2015 PM 01:18
>>
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Naga and Chris,
>>
>>
>>
>> Yes you are right. I don't have hadoop installed on my windows machine
>> and i wish to move my files from windows to remote hadoop cluster (on linux
>> server).
>>
>>
>>
>> And also my cluster is Kerberos enabled. Can you please help here? Let me
>> know the steps that should I follow to implement it?
>>
>>
>>
>> Thanks and Regards
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>> Hi Shashi,
>>
>>
>>
>> Not sure i got your question right, but if its related to building of
>> Hadoop on windows then i think what ever steps mentioned by James and Chris
>> would be definitely help.
>>
>> But is your scenario to remotely(not on one of the nodes of cluster)
>> access HDFS through java from either windows or linux machines ?
>>
>> In that case certain set of jars needs to be in client machine(refer
>> hadoop-client/pom.xml) and subset of the server configurations (even if
>> full not a problem) is required to access the HDFS and YARN
>>
>>
>>
>> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
>> required in the remote machine ? AFAIK its not required, correct me if i am
>> wrong !
>>
>>
>>
>> + Naga
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>> *From:* Chris Nauroth [cnauroth@hortonworks.com]
>> *Sent:* Monday, November 02, 2015 02:10
>> *To:* user@hadoop.apache.org
>>
>>
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> In addition to the standard Hadoop jars available in an Apache Hadoop
>> distro, Windows also requires the native components for Windows:
>> winutils.exe and hadoop.dll.  This wiki page has more details on how that
>> works:
>>
>>
>>
>> https://wiki.apache.org/hadoop/WindowsProblems
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *James Bond <bo...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Sunday, November 1, 2015 at 9:35 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Re: Utility to push data into HDFS
>>
>>
>>
>> I am guessing this should work -
>>
>>
>>
>>
>> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>>
>>
>>
>> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
>> shashi.vish123@gmail.com> wrote:
>>
>> Hi Chris,
>>
>>
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>>
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>>
>>
>> In order to execute this , what are list items I should have on windows?
>>
>> For example hadoop jars etc.
>>
>>
>>
>> If you can throw some light on this then it would be great help.
>>
>>
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>> Hello Shashi,
>>
>>
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>>
>>
>> These commands work on both *nix and Windows.
>>
>>
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *Shashi Vishwakarma <sh...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Saturday, October 31, 2015 at 5:46 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Utility to push data into HDFS
>>
>>
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@apache.org>.

thats cool.

-Vinay

On Tue, Nov 3, 2015 at 9:34 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Thanks all...It was a cluster issue...Its working for me now....:)
> On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:
>
>> Hi Shashi,
>>
>>
>>
>>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default)
>> from any of the cluster machine’s Hadoop installation as mentioned in #1 in
>> Andreina’s reply below?
>> I hope, if cluster is running successfully with Kerberos enabled, it
>> should have a configuration “dfs.namenode.kerberos.principal"
>>
>>
>>
>>    Also you need to keep this directory ( yes, directory itself, not
>> files inside it) in class path of your client program.
>>
>>
>>
>> -Vinay
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* Monday, November 02, 2015 10:47 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Andreina,
>>
>>
>>
>> I used you java code and ran it using java command. On console I can see
>> message as Login Successful but while accessing HDFS I am getting below
>> error message:
>>
>>
>>
>> "Failed to specify server's kerberos principal name"
>>
>>
>>
>> Any suggestion for this?
>>
>>
>>
>> Thanks and Regards,
>>
>> Shashi
>>
>>
>>
>> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>>
>>
>>
>> Hi Shashi Vishwakarma ,
>>
>>
>>
>> You can follow below steps to perform HDFS operation using java code on a
>> secure cluster
>>
>>
>>
>> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
>> cluster
>>
>> 2.       Create a maven project with dependeny hadoop-client
>>
>>     <dependency>
>>
>>     <groupId>org.apache.hadoop</groupId>
>>
>>    <artifactId>hadoop-client</artifactId>
>>
>>    <version><version>-SNAPSHOT</version>
>>
>>    </dependency>
>>
>>
>>
>> 3.      Build the maven project, to resolve all the dependencies
>>
>> 4.      Add conf directory to classpath.
>>
>> 5.      Use below sample code to perform HDFS operation.
>>
>>
>>
>>             public class KerberosTest {
>>
>>
>>
>>                public static void main(String[] args) throws IOException {
>>
>>                  // This should be ideally default. now just for this
>> purpose overriding
>>
>>                  System.setProperty("java.security.krb5.conf",
>> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>>
>>
>>
>>                  // Login using keytab if have access to keytab. else
>>
>>                  UserGroupInformation.loginUserFromKeytab("hdfs @
>> HADOOP.COM",
>>
>>
>>          "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>>
>>
>>
>>                  String dest = "/test/userupload/file";
>>
>>                  String localFile = "pom.xml";
>>
>>
>>
>>                  Configuration conf = new HdfsConfiguration();
>>
>>                  FileSystem fs = FileSystem.get(conf);
>>
>>                  FSDataOutputStream out = fs.create(new Path(dest));
>>
>>                  FileInputStream fIn = new FileInputStream(localFile);
>>
>>                  IOUtils.copyBytes(fIn, out, 1024);
>>
>>               }
>>
>>
>>
>>             }
>>
>>          Note: Change the paths mentioned above accordingly
>>
>>
>>
>> Regards,
>>
>> Andreina J.
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* 02 November 2015 PM 01:18
>>
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Naga and Chris,
>>
>>
>>
>> Yes you are right. I don't have hadoop installed on my windows machine
>> and i wish to move my files from windows to remote hadoop cluster (on linux
>> server).
>>
>>
>>
>> And also my cluster is Kerberos enabled. Can you please help here? Let me
>> know the steps that should I follow to implement it?
>>
>>
>>
>> Thanks and Regards
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>> Hi Shashi,
>>
>>
>>
>> Not sure i got your question right, but if its related to building of
>> Hadoop on windows then i think what ever steps mentioned by James and Chris
>> would be definitely help.
>>
>> But is your scenario to remotely(not on one of the nodes of cluster)
>> access HDFS through java from either windows or linux machines ?
>>
>> In that case certain set of jars needs to be in client machine(refer
>> hadoop-client/pom.xml) and subset of the server configurations (even if
>> full not a problem) is required to access the HDFS and YARN
>>
>>
>>
>> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
>> required in the remote machine ? AFAIK its not required, correct me if i am
>> wrong !
>>
>>
>>
>> + Naga
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>> *From:* Chris Nauroth [cnauroth@hortonworks.com]
>> *Sent:* Monday, November 02, 2015 02:10
>> *To:* user@hadoop.apache.org
>>
>>
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> In addition to the standard Hadoop jars available in an Apache Hadoop
>> distro, Windows also requires the native components for Windows:
>> winutils.exe and hadoop.dll.  This wiki page has more details on how that
>> works:
>>
>>
>>
>> https://wiki.apache.org/hadoop/WindowsProblems
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *James Bond <bo...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Sunday, November 1, 2015 at 9:35 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Re: Utility to push data into HDFS
>>
>>
>>
>> I am guessing this should work -
>>
>>
>>
>>
>> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>>
>>
>>
>> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
>> shashi.vish123@gmail.com> wrote:
>>
>> Hi Chris,
>>
>>
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>>
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>>
>>
>> In order to execute this , what are list items I should have on windows?
>>
>> For example hadoop jars etc.
>>
>>
>>
>> If you can throw some light on this then it would be great help.
>>
>>
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>> Hello Shashi,
>>
>>
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>>
>>
>> These commands work on both *nix and Windows.
>>
>>
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *Shashi Vishwakarma <sh...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Saturday, October 31, 2015 at 5:46 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Utility to push data into HDFS
>>
>>
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@apache.org>.

thats cool.

-Vinay

On Tue, Nov 3, 2015 at 9:34 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Thanks all...It was a cluster issue...Its working for me now....:)
> On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:
>
>> Hi Shashi,
>>
>>
>>
>>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default)
>> from any of the cluster machine’s Hadoop installation as mentioned in #1 in
>> Andreina’s reply below?
>> I hope, if cluster is running successfully with Kerberos enabled, it
>> should have a configuration “dfs.namenode.kerberos.principal"
>>
>>
>>
>>    Also you need to keep this directory ( yes, directory itself, not
>> files inside it) in class path of your client program.
>>
>>
>>
>> -Vinay
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* Monday, November 02, 2015 10:47 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Andreina,
>>
>>
>>
>> I used you java code and ran it using java command. On console I can see
>> message as Login Successful but while accessing HDFS I am getting below
>> error message:
>>
>>
>>
>> "Failed to specify server's kerberos principal name"
>>
>>
>>
>> Any suggestion for this?
>>
>>
>>
>> Thanks and Regards,
>>
>> Shashi
>>
>>
>>
>> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>>
>>
>>
>> Hi Shashi Vishwakarma ,
>>
>>
>>
>> You can follow below steps to perform HDFS operation using java code on a
>> secure cluster
>>
>>
>>
>> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
>> cluster
>>
>> 2.       Create a maven project with dependeny hadoop-client
>>
>>     <dependency>
>>
>>     <groupId>org.apache.hadoop</groupId>
>>
>>    <artifactId>hadoop-client</artifactId>
>>
>>    <version><version>-SNAPSHOT</version>
>>
>>    </dependency>
>>
>>
>>
>> 3.      Build the maven project, to resolve all the dependencies
>>
>> 4.      Add conf directory to classpath.
>>
>> 5.      Use below sample code to perform HDFS operation.
>>
>>
>>
>>             public class KerberosTest {
>>
>>
>>
>>                public static void main(String[] args) throws IOException {
>>
>>                  // This should be ideally default. now just for this
>> purpose overriding
>>
>>                  System.setProperty("java.security.krb5.conf",
>> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>>
>>
>>
>>                  // Login using keytab if have access to keytab. else
>>
>>                  UserGroupInformation.loginUserFromKeytab("hdfs @
>> HADOOP.COM",
>>
>>
>>          "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>>
>>
>>
>>                  String dest = "/test/userupload/file";
>>
>>                  String localFile = "pom.xml";
>>
>>
>>
>>                  Configuration conf = new HdfsConfiguration();
>>
>>                  FileSystem fs = FileSystem.get(conf);
>>
>>                  FSDataOutputStream out = fs.create(new Path(dest));
>>
>>                  FileInputStream fIn = new FileInputStream(localFile);
>>
>>                  IOUtils.copyBytes(fIn, out, 1024);
>>
>>               }
>>
>>
>>
>>             }
>>
>>          Note: Change the paths mentioned above accordingly
>>
>>
>>
>> Regards,
>>
>> Andreina J.
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* 02 November 2015 PM 01:18
>>
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Naga and Chris,
>>
>>
>>
>> Yes you are right. I don't have hadoop installed on my windows machine
>> and i wish to move my files from windows to remote hadoop cluster (on linux
>> server).
>>
>>
>>
>> And also my cluster is Kerberos enabled. Can you please help here? Let me
>> know the steps that should I follow to implement it?
>>
>>
>>
>> Thanks and Regards
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>> Hi Shashi,
>>
>>
>>
>> Not sure i got your question right, but if its related to building of
>> Hadoop on windows then i think what ever steps mentioned by James and Chris
>> would be definitely help.
>>
>> But is your scenario to remotely(not on one of the nodes of cluster)
>> access HDFS through java from either windows or linux machines ?
>>
>> In that case certain set of jars needs to be in client machine(refer
>> hadoop-client/pom.xml) and subset of the server configurations (even if
>> full not a problem) is required to access the HDFS and YARN
>>
>>
>>
>> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
>> required in the remote machine ? AFAIK its not required, correct me if i am
>> wrong !
>>
>>
>>
>> + Naga
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>> *From:* Chris Nauroth [cnauroth@hortonworks.com]
>> *Sent:* Monday, November 02, 2015 02:10
>> *To:* user@hadoop.apache.org
>>
>>
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> In addition to the standard Hadoop jars available in an Apache Hadoop
>> distro, Windows also requires the native components for Windows:
>> winutils.exe and hadoop.dll.  This wiki page has more details on how that
>> works:
>>
>>
>>
>> https://wiki.apache.org/hadoop/WindowsProblems
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *James Bond <bo...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Sunday, November 1, 2015 at 9:35 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Re: Utility to push data into HDFS
>>
>>
>>
>> I am guessing this should work -
>>
>>
>>
>>
>> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>>
>>
>>
>> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
>> shashi.vish123@gmail.com> wrote:
>>
>> Hi Chris,
>>
>>
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>>
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>>
>>
>> In order to execute this , what are list items I should have on windows?
>>
>> For example hadoop jars etc.
>>
>>
>>
>> If you can throw some light on this then it would be great help.
>>
>>
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>> Hello Shashi,
>>
>>
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>>
>>
>> These commands work on both *nix and Windows.
>>
>>
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *Shashi Vishwakarma <sh...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Saturday, October 31, 2015 at 5:46 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Utility to push data into HDFS
>>
>>
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

Re: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@apache.org>.

thats cool.

-Vinay

On Tue, Nov 3, 2015 at 9:34 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Thanks all...It was a cluster issue...Its working for me now....:)
> On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:
>
>> Hi Shashi,
>>
>>
>>
>>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default)
>> from any of the cluster machine’s Hadoop installation as mentioned in #1 in
>> Andreina’s reply below?
>> I hope, if cluster is running successfully with Kerberos enabled, it
>> should have a configuration “dfs.namenode.kerberos.principal"
>>
>>
>>
>>    Also you need to keep this directory ( yes, directory itself, not
>> files inside it) in class path of your client program.
>>
>>
>>
>> -Vinay
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* Monday, November 02, 2015 10:47 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Andreina,
>>
>>
>>
>> I used you java code and ran it using java command. On console I can see
>> message as Login Successful but while accessing HDFS I am getting below
>> error message:
>>
>>
>>
>> "Failed to specify server's kerberos principal name"
>>
>>
>>
>> Any suggestion for this?
>>
>>
>>
>> Thanks and Regards,
>>
>> Shashi
>>
>>
>>
>> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>>
>>
>>
>> Hi Shashi Vishwakarma ,
>>
>>
>>
>> You can follow below steps to perform HDFS operation using java code on a
>> secure cluster
>>
>>
>>
>> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
>> cluster
>>
>> 2.       Create a maven project with dependeny hadoop-client
>>
>>     <dependency>
>>
>>     <groupId>org.apache.hadoop</groupId>
>>
>>    <artifactId>hadoop-client</artifactId>
>>
>>    <version><version>-SNAPSHOT</version>
>>
>>    </dependency>
>>
>>
>>
>> 3.      Build the maven project, to resolve all the dependencies
>>
>> 4.      Add conf directory to classpath.
>>
>> 5.      Use below sample code to perform HDFS operation.
>>
>>
>>
>>             public class KerberosTest {
>>
>>
>>
>>                public static void main(String[] args) throws IOException {
>>
>>                  // This should be ideally default. now just for this
>> purpose overriding
>>
>>                  System.setProperty("java.security.krb5.conf",
>> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>>
>>
>>
>>                  // Login using keytab if have access to keytab. else
>>
>>                  UserGroupInformation.loginUserFromKeytab("hdfs @
>> HADOOP.COM",
>>
>>
>>          "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>>
>>
>>
>>                  String dest = "/test/userupload/file";
>>
>>                  String localFile = "pom.xml";
>>
>>
>>
>>                  Configuration conf = new HdfsConfiguration();
>>
>>                  FileSystem fs = FileSystem.get(conf);
>>
>>                  FSDataOutputStream out = fs.create(new Path(dest));
>>
>>                  FileInputStream fIn = new FileInputStream(localFile);
>>
>>                  IOUtils.copyBytes(fIn, out, 1024);
>>
>>               }
>>
>>
>>
>>             }
>>
>>          Note: Change the paths mentioned above accordingly
>>
>>
>>
>> Regards,
>>
>> Andreina J.
>>
>>
>>
>> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
>> *Sent:* 02 November 2015 PM 01:18
>>
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> Hi Naga and Chris,
>>
>>
>>
>> Yes you are right. I don't have hadoop installed on my windows machine
>> and i wish to move my files from windows to remote hadoop cluster (on linux
>> server).
>>
>>
>>
>> And also my cluster is Kerberos enabled. Can you please help here? Let me
>> know the steps that should I follow to implement it?
>>
>>
>>
>> Thanks and Regards
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>> Hi Shashi,
>>
>>
>>
>> Not sure i got your question right, but if its related to building of
>> Hadoop on windows then i think what ever steps mentioned by James and Chris
>> would be definitely help.
>>
>> But is your scenario to remotely(not on one of the nodes of cluster)
>> access HDFS through java from either windows or linux machines ?
>>
>> In that case certain set of jars needs to be in client machine(refer
>> hadoop-client/pom.xml) and subset of the server configurations (even if
>> full not a problem) is required to access the HDFS and YARN
>>
>>
>>
>> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
>> required in the remote machine ? AFAIK its not required, correct me if i am
>> wrong !
>>
>>
>>
>> + Naga
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>> *From:* Chris Nauroth [cnauroth@hortonworks.com]
>> *Sent:* Monday, November 02, 2015 02:10
>> *To:* user@hadoop.apache.org
>>
>>
>> *Subject:* Re: Utility to push data into HDFS
>>
>>
>>
>> In addition to the standard Hadoop jars available in an Apache Hadoop
>> distro, Windows also requires the native components for Windows:
>> winutils.exe and hadoop.dll.  This wiki page has more details on how that
>> works:
>>
>>
>>
>> https://wiki.apache.org/hadoop/WindowsProblems
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *James Bond <bo...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Sunday, November 1, 2015 at 9:35 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Re: Utility to push data into HDFS
>>
>>
>>
>> I am guessing this should work -
>>
>>
>>
>>
>> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>>
>>
>>
>> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
>> shashi.vish123@gmail.com> wrote:
>>
>> Hi Chris,
>>
>>
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>>
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>>
>>
>> In order to execute this , what are list items I should have on windows?
>>
>> For example hadoop jars etc.
>>
>>
>>
>> If you can throw some light on this then it would be great help.
>>
>>
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>> Hello Shashi,
>>
>>
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>>
>>
>> These commands work on both *nix and Windows.
>>
>>
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>>
>>
>> --Chris Nauroth
>>
>>
>>
>> *From: *Shashi Vishwakarma <sh...@gmail.com>
>> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Date: *Saturday, October 31, 2015 at 5:46 AM
>> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Subject: *Utility to push data into HDFS
>>
>>
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

hadoop not using whole disk for HDFS

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? 
B.

RE: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Thanks all...It was a cluster issue...Its working for me now....:)
On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:

> Hi Shashi,
>
>
>
>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default) from
> any of the cluster machine’s Hadoop installation as mentioned in #1 in
> Andreina’s reply below?
> I hope, if cluster is running successfully with Kerberos enabled, it
> should have a configuration “dfs.namenode.kerberos.principal"
>
>
>
>    Also you need to keep this directory ( yes, directory itself, not files
> inside it) in class path of your client program.
>
>
>
> -Vinay
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* Monday, November 02, 2015 10:47 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Andreina,
>
>
>
> I used you java code and ran it using java command. On console I can see
> message as Login Successful but while accessing HDFS I am getting below
> error message:
>
>
>
> "Failed to specify server's kerberos principal name"
>
>
>
> Any suggestion for this?
>
>
>
> Thanks and Regards,
>
> Shashi
>
>
>
> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>
>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>

RE: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Thanks all...It was a cluster issue...Its working for me now....:)
On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:

> Hi Shashi,
>
>
>
>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default) from
> any of the cluster machine’s Hadoop installation as mentioned in #1 in
> Andreina’s reply below?
> I hope, if cluster is running successfully with Kerberos enabled, it
> should have a configuration “dfs.namenode.kerberos.principal"
>
>
>
>    Also you need to keep this directory ( yes, directory itself, not files
> inside it) in class path of your client program.
>
>
>
> -Vinay
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* Monday, November 02, 2015 10:47 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Andreina,
>
>
>
> I used you java code and ran it using java command. On console I can see
> message as Login Successful but while accessing HDFS I am getting below
> error message:
>
>
>
> "Failed to specify server's kerberos principal name"
>
>
>
> Any suggestion for this?
>
>
>
> Thanks and Regards,
>
> Shashi
>
>
>
> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>
>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>

RE: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Thanks all...It was a cluster issue...Its working for me now....:)
On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:

> Hi Shashi,
>
>
>
>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default) from
> any of the cluster machine’s Hadoop installation as mentioned in #1 in
> Andreina’s reply below?
> I hope, if cluster is running successfully with Kerberos enabled, it
> should have a configuration “dfs.namenode.kerberos.principal"
>
>
>
>    Also you need to keep this directory ( yes, directory itself, not files
> inside it) in class path of your client program.
>
>
>
> -Vinay
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* Monday, November 02, 2015 10:47 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Andreina,
>
>
>
> I used you java code and ran it using java command. On console I can see
> message as Login Successful but while accessing HDFS I am getting below
> error message:
>
>
>
> "Failed to specify server's kerberos principal name"
>
>
>
> Any suggestion for this?
>
>
>
> Thanks and Regards,
>
> Shashi
>
>
>
> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>
>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>

RE: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Thanks all...It was a cluster issue...Its working for me now....:)
On 3 Nov 2015 7:01 am, "Vinayakumar B" <vi...@huawei.com> wrote:

> Hi Shashi,
>
>
>
>   Did you copy conf directory (ex: *<hadoop>/etc/hadoop *by default) from
> any of the cluster machine’s Hadoop installation as mentioned in #1 in
> Andreina’s reply below?
> I hope, if cluster is running successfully with Kerberos enabled, it
> should have a configuration “dfs.namenode.kerberos.principal"
>
>
>
>    Also you need to keep this directory ( yes, directory itself, not files
> inside it) in class path of your client program.
>
>
>
> -Vinay
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* Monday, November 02, 2015 10:47 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Andreina,
>
>
>
> I used you java code and ran it using java command. On console I can see
> message as Login Successful but while accessing HDFS I am getting below
> error message:
>
>
>
> "Failed to specify server's kerberos principal name"
>
>
>
> Any suggestion for this?
>
>
>
> Thanks and Regards,
>
> Shashi
>
>
>
> On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:
>
>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>

RE: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Shashi,

  Did you copy conf directory (ex: <hadoop>/etc/hadoop by default) from any of the cluster machine’s Hadoop installation as mentioned in #1 in Andreina’s reply below?
I hope, if cluster is running successfully with Kerberos enabled, it should have a configuration “dfs.namenode.kerberos.principal"

   Also you need to keep this directory ( yes, directory itself, not files inside it) in class path of your client program.

-Vinay

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: Monday, November 02, 2015 10:47 PM
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Andreina,

I used you java code and ran it using java command. On console I can see message as Login Successful but while accessing HDFS I am getting below error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com>> wrote:

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM<http://HADOOP.COM>",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com<ma...@gmail.com>]
Sent: 02 November 2015 PM 01:18

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Shashi,

  Did you copy conf directory (ex: <hadoop>/etc/hadoop by default) from any of the cluster machine’s Hadoop installation as mentioned in #1 in Andreina’s reply below?
I hope, if cluster is running successfully with Kerberos enabled, it should have a configuration “dfs.namenode.kerberos.principal"

   Also you need to keep this directory ( yes, directory itself, not files inside it) in class path of your client program.

-Vinay

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: Monday, November 02, 2015 10:47 PM
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Andreina,

I used you java code and ran it using java command. On console I can see message as Login Successful but while accessing HDFS I am getting below error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com>> wrote:

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM<http://HADOOP.COM>",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com<ma...@gmail.com>]
Sent: 02 November 2015 PM 01:18

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Shashi,

  Did you copy conf directory (ex: <hadoop>/etc/hadoop by default) from any of the cluster machine’s Hadoop installation as mentioned in #1 in Andreina’s reply below?
I hope, if cluster is running successfully with Kerberos enabled, it should have a configuration “dfs.namenode.kerberos.principal"

   Also you need to keep this directory ( yes, directory itself, not files inside it) in class path of your client program.

-Vinay

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: Monday, November 02, 2015 10:47 PM
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Andreina,

I used you java code and ran it using java command. On console I can see message as Login Successful but while accessing HDFS I am getting below error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com>> wrote:

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM<http://HADOOP.COM>",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com<ma...@gmail.com>]
Sent: 02 November 2015 PM 01:18

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Shashi,

  Did you copy conf directory (ex: <hadoop>/etc/hadoop by default) from any of the cluster machine’s Hadoop installation as mentioned in #1 in Andreina’s reply below?
I hope, if cluster is running successfully with Kerberos enabled, it should have a configuration “dfs.namenode.kerberos.principal"

   Also you need to keep this directory ( yes, directory itself, not files inside it) in class path of your client program.

-Vinay

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: Monday, November 02, 2015 10:47 PM
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Andreina,

I used you java code and ran it using java command. On console I can see message as Login Successful but while accessing HDFS I am getting below error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com>> wrote:

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM<http://HADOOP.COM>",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.

From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com<ma...@gmail.com>]
Sent: 02 November 2015 PM 01:18

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Andreina,

I used you java code and ran it using java command. On console I can see
message as Login Successful but while accessing HDFS I am getting below
error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:

>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Andreina,

I used you java code and ran it using java command. On console I can see
message as Login Successful but while accessing HDFS I am getting below
error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:

>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Andreina,

I used you java code and ran it using java command. On console I can see
message as Login Successful but while accessing HDFS I am getting below
error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:

>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Andreina,

I used you java code and ran it using java command. On console I can see
message as Login Successful but while accessing HDFS I am getting below
error message:

"Failed to specify server's kerberos principal name"

Any suggestion for this?

Thanks and Regards,
Shashi

On Mon, Nov 2, 2015 at 4:36 PM, andreina j <an...@huawei.com> wrote:

>
>
> Hi Shashi Vishwakarma ,
>
>
>
> You can follow below steps to perform HDFS operation using java code on a
> secure cluster
>
>
>
> 1.      Copy krb5.conf, hdfs.keytab and conf directory from installed
> cluster
>
> 2.       Create a maven project with dependeny hadoop-client
>
>     <dependency>
>
>     <groupId>org.apache.hadoop</groupId>
>
>    <artifactId>hadoop-client</artifactId>
>
>    <version><version>-SNAPSHOT</version>
>
>    </dependency>
>
>
>
> 3.      Build the maven project, to resolve all the dependencies
>
> 4.      Add conf directory to classpath.
>
> 5.      Use below sample code to perform HDFS operation.
>
>
>
>             public class KerberosTest {
>
>
>
>                public static void main(String[] args) throws IOException {
>
>                  // This should be ideally default. now just for this
> purpose overriding
>
>                  System.setProperty("java.security.krb5.conf",
> "D:\\data\\Desktop\\cluster-test\\krb5.conf");
>
>
>
>                  // Login using keytab if have access to keytab. else
>
>                  UserGroupInformation.loginUserFromKeytab("hdfs @
> HADOOP.COM",
>
>                      "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");
>
>
>
>                  String dest = "/test/userupload/file";
>
>                  String localFile = "pom.xml";
>
>
>
>                  Configuration conf = new HdfsConfiguration();
>
>                  FileSystem fs = FileSystem.get(conf);
>
>                  FSDataOutputStream out = fs.create(new Path(dest));
>
>                  FileInputStream fIn = new FileInputStream(localFile);
>
>                  IOUtils.copyBytes(fIn, out, 1024);
>
>               }
>
>
>
>             }
>
>          Note: Change the paths mentioned above accordingly
>
>
>
> Regards,
>
> Andreina J.
>
>
>
> *From:* Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
> *Sent:* 02 November 2015 PM 01:18
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> Hi Naga and Chris,
>
>
>
> Yes you are right. I don't have hadoop installed on my windows machine and
> i wish to move my files from windows to remote hadoop cluster (on linux
> server).
>
>
>
> And also my cluster is Kerberos enabled. Can you please help here? Let me
> know the steps that should I follow to implement it?
>
>
>
> Thanks and Regards
>
> Shashi
>
>
>
>
>
>
>
> On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
> Hi Shashi,
>
>
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
>
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
>
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
>
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
>
>
> + Naga
>
>
>
>
> ------------------------------
>
>
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
>
> *Subject:* Re: Utility to push data into HDFS
>
>
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
>
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
>
>
> --Chris Nauroth
>
>
>
> *From: *James Bond <bo...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Sunday, November 1, 2015 at 9:35 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Re: Utility to push data into HDFS
>
>
>
> I am guessing this should work -
>
>
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
>
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
>
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
>
>
> In order to execute this , what are list items I should have on windows?
>
> For example hadoop jars etc.
>
>
>
> If you can throw some light on this then it would be great help.
>
>
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> Hello Shashi,
>
>
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
>
>
> These commands work on both *nix and Windows.
>
>
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
>
>
> --Chris Nauroth
>
>
>
> *From: *Shashi Vishwakarma <sh...@gmail.com>
> *Reply-To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Date: *Saturday, October 31, 2015 at 5:46 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Utility to push data into HDFS
>
>
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>
>
>
>
>
>
>

RE: Utility to push data into HDFS

Posted by andreina j <an...@huawei.com>.

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.


From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: 02 November 2015 PM 01:18
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by andreina j <an...@huawei.com>.

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.


From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: 02 November 2015 PM 01:18
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by andreina j <an...@huawei.com>.

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.


From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: 02 November 2015 PM 01:18
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by andreina j <an...@huawei.com>.

Hi Shashi Vishwakarma ,

You can follow below steps to perform HDFS operation using java code on a secure cluster


1.      Copy krb5.conf, hdfs.keytab and conf directory from installed cluster

2.       Create a maven project with dependeny hadoop-client

    <dependency>

    <groupId>org.apache.hadoop</groupId>

   <artifactId>hadoop-client</artifactId>

   <version><version>-SNAPSHOT</version>

   </dependency>



3.      Build the maven project, to resolve all the dependencies

4.      Add conf directory to classpath.

5.      Use below sample code to perform HDFS operation.


            public class KerberosTest {

               public static void main(String[] args) throws IOException {
                 // This should be ideally default. now just for this purpose overriding
                 System.setProperty("java.security.krb5.conf", "D:\\data\\Desktop\\cluster-test\\krb5.conf");

                 // Login using keytab if have access to keytab. else
                 UserGroupInformation.loginUserFromKeytab("hdfs @HADOOP.COM",
                     "D:\\data\\Desktop\\cluster-test\\conf\\hdfs.keytab");

                 String dest = "/test/userupload/file";
                 String localFile = "pom.xml";

                 Configuration conf = new HdfsConfiguration();
                 FileSystem fs = FileSystem.get(conf);
                 FSDataOutputStream out = fs.create(new Path(dest));
                 FileInputStream fIn = new FileInputStream(localFile);
                 IOUtils.copyBytes(fIn, out, 1024);
              }

            }
         Note: Change the paths mentioned above accordingly

Regards,
Andreina J.


From: Shashi Vishwakarma [mailto:shashi.vish123@gmail.com]
Sent: 02 November 2015 PM 01:18
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and i wish to move my files from windows to remote hadoop cluster (on linux server).

And also my cluster is Kerberos enabled. Can you please help here? Let me know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com<ma...@hortonworks.com>]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and
i wish to move my files from windows to remote hadoop cluster (on linux
server).

And also my cluster is Kerberos enabled. Can you please help here? Let me
know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Shashi,
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
> + Naga
>
>
> ------------------------------
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
> *Subject:* Re: Utility to push data into HDFS
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
> --Chris Nauroth
>
> From: James Bond <bo...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Sunday, November 1, 2015 at 9:35 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Utility to push data into HDFS
>
> I am guessing this should work -
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
>> Hi Chris,
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>> In order to execute this , what are list items I should have on windows?
>> For example hadoop jars etc.
>>
>> If you can throw some light on this then it would be great help.
>>
>> Thanks
>> Shashi
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>> Hello Shashi,
>>>
>>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>>> sufficient?
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>>
>>> These commands work on both *nix and Windows.
>>>
>>> Another option would be WebHDFS, which just requires an HTTP client on
>>> your platform of choice.
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>>
>>> --Chris Nauroth
>>>
>>> From: Shashi Vishwakarma <sh...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, October 31, 2015 at 5:46 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Utility to push data into HDFS
>>>
>>> Hi
>>>
>>> I need build a common utility for unix/windows based system to push data
>>> into hadoop system. User can run that utility from any platform and should
>>> be able to push data into HDFS.
>>>
>>> Any suggestions ?
>>>
>>> Thanks
>>>
>>> Shashi
>>>
>>
>>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and
i wish to move my files from windows to remote hadoop cluster (on linux
server).

And also my cluster is Kerberos enabled. Can you please help here? Let me
know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Shashi,
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
> + Naga
>
>
> ------------------------------
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
> *Subject:* Re: Utility to push data into HDFS
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
> --Chris Nauroth
>
> From: James Bond <bo...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Sunday, November 1, 2015 at 9:35 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Utility to push data into HDFS
>
> I am guessing this should work -
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
>> Hi Chris,
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>> In order to execute this , what are list items I should have on windows?
>> For example hadoop jars etc.
>>
>> If you can throw some light on this then it would be great help.
>>
>> Thanks
>> Shashi
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>> Hello Shashi,
>>>
>>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>>> sufficient?
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>>
>>> These commands work on both *nix and Windows.
>>>
>>> Another option would be WebHDFS, which just requires an HTTP client on
>>> your platform of choice.
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>>
>>> --Chris Nauroth
>>>
>>> From: Shashi Vishwakarma <sh...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, October 31, 2015 at 5:46 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Utility to push data into HDFS
>>>
>>> Hi
>>>
>>> I need build a common utility for unix/windows based system to push data
>>> into hadoop system. User can run that utility from any platform and should
>>> be able to push data into HDFS.
>>>
>>> Any suggestions ?
>>>
>>> Thanks
>>>
>>> Shashi
>>>
>>
>>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and
i wish to move my files from windows to remote hadoop cluster (on linux
server).

And also my cluster is Kerberos enabled. Can you please help here? Let me
know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Shashi,
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
> + Naga
>
>
> ------------------------------
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
> *Subject:* Re: Utility to push data into HDFS
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
> --Chris Nauroth
>
> From: James Bond <bo...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Sunday, November 1, 2015 at 9:35 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Utility to push data into HDFS
>
> I am guessing this should work -
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
>> Hi Chris,
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>> In order to execute this , what are list items I should have on windows?
>> For example hadoop jars etc.
>>
>> If you can throw some light on this then it would be great help.
>>
>> Thanks
>> Shashi
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>> Hello Shashi,
>>>
>>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>>> sufficient?
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>>
>>> These commands work on both *nix and Windows.
>>>
>>> Another option would be WebHDFS, which just requires an HTTP client on
>>> your platform of choice.
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>>
>>> --Chris Nauroth
>>>
>>> From: Shashi Vishwakarma <sh...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, October 31, 2015 at 5:46 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Utility to push data into HDFS
>>>
>>> Hi
>>>
>>> I need build a common utility for unix/windows based system to push data
>>> into hadoop system. User can run that utility from any platform and should
>>> be able to push data into HDFS.
>>>
>>> Any suggestions ?
>>>
>>> Thanks
>>>
>>> Shashi
>>>
>>
>>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Naga and Chris,

Yes you are right. I don't have hadoop installed on my windows machine and
i wish to move my files from windows to remote hadoop cluster (on linux
server).

And also my cluster is Kerberos enabled. Can you please help here? Let me
know the steps that should I follow to implement it?

Thanks and Regards
Shashi



On Mon, Nov 2, 2015 at 7:33 AM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Shashi,
>
> Not sure i got your question right, but if its related to building of
> Hadoop on windows then i think what ever steps mentioned by James and Chris
> would be definitely help.
> But is your scenario to remotely(not on one of the nodes of cluster)
> access HDFS through java from either windows or linux machines ?
> In that case certain set of jars needs to be in client machine(refer
> hadoop-client/pom.xml) and subset of the server configurations (even if
> full not a problem) is required to access the HDFS and YARN
>
> @Chris Nauroth,  Are native components (winutils.exe and hadoop.dll),
> required in the remote machine ? AFAIK its not required, correct me if i am
> wrong !
>
> + Naga
>
>
> ------------------------------
>
> *From:* Chris Nauroth [cnauroth@hortonworks.com]
> *Sent:* Monday, November 02, 2015 02:10
> *To:* user@hadoop.apache.org
>
> *Subject:* Re: Utility to push data into HDFS
>
> In addition to the standard Hadoop jars available in an Apache Hadoop
> distro, Windows also requires the native components for Windows:
> winutils.exe and hadoop.dll.  This wiki page has more details on how that
> works:
>
> https://wiki.apache.org/hadoop/WindowsProblems
>
> --Chris Nauroth
>
> From: James Bond <bo...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Sunday, November 1, 2015 at 9:35 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Utility to push data into HDFS
>
> I am guessing this should work -
>
>
> https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies
>
> On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
>> Hi Chris,
>>
>> Thanks for your reply. I agree WebHDFS is one of the option to access
>> hadoop from windows or *nix. I wanted to know if I can write a java code
>> will can be executed from windows?
>>
>> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
>> (hadoop fs -ls) written in java.
>>
>> In order to execute this , what are list items I should have on windows?
>> For example hadoop jars etc.
>>
>> If you can throw some light on this then it would be great help.
>>
>> Thanks
>> Shashi
>>
>>
>>
>>
>>
>> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>> Hello Shashi,
>>>
>>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>>> sufficient?
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>>
>>> These commands work on both *nix and Windows.
>>>
>>> Another option would be WebHDFS, which just requires an HTTP client on
>>> your platform of choice.
>>>
>>>
>>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>>
>>> --Chris Nauroth
>>>
>>> From: Shashi Vishwakarma <sh...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, October 31, 2015 at 5:46 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Utility to push data into HDFS
>>>
>>> Hi
>>>
>>> I need build a common utility for unix/windows based system to push data
>>> into hadoop system. User can run that utility from any platform and should
>>> be able to push data into HDFS.
>>>
>>> Any suggestions ?
>>>
>>> Thanks
>>>
>>> Shashi
>>>
>>
>>
>

RE: Utility to push data into HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

RE: Utility to push data into HDFS

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Shashi,

Not sure i got your question right, but if its related to building of Hadoop on windows then i think what ever steps mentioned by James and Chris would be definitely help.
But is your scenario to remotely(not on one of the nodes of cluster) access HDFS through java from either windows or linux machines ?
In that case certain set of jars needs to be in client machine(refer hadoop-client/pom.xml) and subset of the server configurations (even if full not a problem) is required to access the HDFS and YARN

@Chris Nauroth,  Are native components (winutils.exe and hadoop.dll), required in the remote machine ? AFAIK its not required, correct me if i am wrong !

+ Naga


________________________________

From: Chris Nauroth [cnauroth@hortonworks.com]
Sent: Monday, November 02, 2015 02:10
To: user@hadoop.apache.org
Subject: Re: Utility to push data into HDFS

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi





On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS

Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS

Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS

Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

In addition to the standard Hadoop jars available in an Apache Hadoop distro, Windows also requires the native components for Windows: winutils.exe and hadoop.dll.  This wiki page has more details on how that works:

https://wiki.apache.org/hadoop/WindowsProblems

--Chris Nauroth

From: James Bond <bo...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Sunday, November 1, 2015 at 9:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Utility to push data into HDFS

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <sh...@gmail.com>> wrote:
Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access hadoop from windows or *nix. I wanted to know if I can write a java code will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand (hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS

Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by James Bond <bo...@gmail.com>.

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Hi Chris,
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
> In order to execute this , what are list items I should have on windows?
> For example hadoop jars etc.
>
> If you can throw some light on this then it would be great help.
>
> Thanks
> Shashi
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Hello Shashi,
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>> These commands work on both *nix and Windows.
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>> --Chris Nauroth
>>
>> From: Shashi Vishwakarma <sh...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, October 31, 2015 at 5:46 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Utility to push data into HDFS
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>
>

Re: Utility to push data into HDFS

Posted by James Bond <bo...@gmail.com>.

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Hi Chris,
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
> In order to execute this , what are list items I should have on windows?
> For example hadoop jars etc.
>
> If you can throw some light on this then it would be great help.
>
> Thanks
> Shashi
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Hello Shashi,
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>> These commands work on both *nix and Windows.
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>> --Chris Nauroth
>>
>> From: Shashi Vishwakarma <sh...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, October 31, 2015 at 5:46 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Utility to push data into HDFS
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>
>

Re: Utility to push data into HDFS

Posted by James Bond <bo...@gmail.com>.

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Hi Chris,
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
> In order to execute this , what are list items I should have on windows?
> For example hadoop jars etc.
>
> If you can throw some light on this then it would be great help.
>
> Thanks
> Shashi
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Hello Shashi,
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>> These commands work on both *nix and Windows.
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>> --Chris Nauroth
>>
>> From: Shashi Vishwakarma <sh...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, October 31, 2015 at 5:46 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Utility to push data into HDFS
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>
>

Re: Utility to push data into HDFS

Posted by James Bond <bo...@gmail.com>.

I am guessing this should work -

https://stackoverflow.com/questions/9722257/building-jar-that-includes-all-its-dependencies

On Sun, Nov 1, 2015 at 8:15 PM, Shashi Vishwakarma <shashi.vish123@gmail.com
> wrote:

> Hi Chris,
>
> Thanks for your reply. I agree WebHDFS is one of the option to access
> hadoop from windows or *nix. I wanted to know if I can write a java code
> will can be executed from windows?
>
> Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
> (hadoop fs -ls) written in java.
>
> In order to execute this , what are list items I should have on windows?
> For example hadoop jars etc.
>
> If you can throw some light on this then it would be great help.
>
> Thanks
> Shashi
>
>
>
>
>
> On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Hello Shashi,
>>
>> Maybe I'm missing some context, but are the Hadoop FsShell commands
>> sufficient?
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>>
>> These commands work on both *nix and Windows.
>>
>> Another option would be WebHDFS, which just requires an HTTP client on
>> your platform of choice.
>>
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>>
>> --Chris Nauroth
>>
>> From: Shashi Vishwakarma <sh...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, October 31, 2015 at 5:46 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Utility to push data into HDFS
>>
>> Hi
>>
>> I need build a common utility for unix/windows based system to push data
>> into hadoop system. User can run that utility from any platform and should
>> be able to push data into HDFS.
>>
>> Any suggestions ?
>>
>> Thanks
>>
>> Shashi
>>
>
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access
hadoop from windows or *nix. I wanted to know if I can write a java code
will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
(hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Shashi,
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
> These commands work on both *nix and Windows.
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
> --Chris Nauroth
>
> From: Shashi Vishwakarma <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, October 31, 2015 at 5:46 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Utility to push data into HDFS
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access
hadoop from windows or *nix. I wanted to know if I can write a java code
will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
(hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Shashi,
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
> These commands work on both *nix and Windows.
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
> --Chris Nauroth
>
> From: Shashi Vishwakarma <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, October 31, 2015 at 5:46 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Utility to push data into HDFS
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access
hadoop from windows or *nix. I wanted to know if I can write a java code
will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
(hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Shashi,
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
> These commands work on both *nix and Windows.
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
> --Chris Nauroth
>
> From: Shashi Vishwakarma <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, October 31, 2015 at 5:46 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Utility to push data into HDFS
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>

Re: Utility to push data into HDFS

Posted by Shashi Vishwakarma <sh...@gmail.com>.

Hi Chris,

Thanks for your reply. I agree WebHDFS is one of the option to access
hadoop from windows or *nix. I wanted to know if I can write a java code
will can be executed from windows?

Ex:  java HDFSPut.java  <<- this java code should have FSShell cammand
(hadoop fs -ls) written in java.

In order to execute this , what are list items I should have on windows?
For example hadoop jars etc.

If you can throw some light on this then it would be great help.

Thanks
Shashi

On Sun, Nov 1, 2015 at 1:39 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Shashi,
>
> Maybe I'm missing some context, but are the Hadoop FsShell commands
> sufficient?
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
>
> These commands work on both *nix and Windows.
>
> Another option would be WebHDFS, which just requires an HTTP client on
> your platform of choice.
>
>
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
>
> --Chris Nauroth
>
> From: Shashi Vishwakarma <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, October 31, 2015 at 5:46 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Utility to push data into HDFS
>
> Hi
>
> I need build a common utility for unix/windows based system to push data
> into hadoop system. User can run that utility from any platform and should
> be able to push data into HDFS.
>
> Any suggestions ?
>
> Thanks
>
> Shashi
>

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi

Re: Utility to push data into HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Shashi,

Maybe I'm missing some context, but are the Hadoop FsShell commands sufficient?

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

These commands work on both *nix and Windows.

Another option would be WebHDFS, which just requires an HTTP client on your platform of choice.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

--Chris Nauroth

From: Shashi Vishwakarma <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, October 31, 2015 at 5:46 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Utility to push data into HDFS


Hi

I need build a common utility for unix/windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

Any suggestions ?

Thanks

Shashi