You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Uma Maheswara Rao G 72686 <ma...@huawei.com> on 2011/10/15 13:51:31 UTC

Re: Is there a good way to see how full hdfs is

/** Return the disk usage of the filesystem, including total capacity,
   * used space, and remaining space */
  public DiskStatus getDiskStatus() throws IOException {
    return dfs.getDiskStatus();
  }

DistributedFileSystem has the above API from java API side.

Regards,
Uma

----- Original Message -----
From: wd <wd...@wdicc.com>
Date: Saturday, October 15, 2011 4:16 pm
Subject: Re: Is there a good way to see how full hdfs is
To: mapreduce-user@hadoop.apache.org

> hadoop dfsadmin -report
> 
> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis 
> <lo...@gmail.com> wrote:
> > We have a small cluster with HDFS running on only 8 nodes - I 
> believe that
> > the partition assigned to hdfs might be getting full and
> > wonder if the web tools or java api havew a way to look at free 
> space on
> > hdfs
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
> >
> >
> 

Re: Is there a good way to see how full hdfs is

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
Yes, that was deprecated in trunk

If you want to use by programatically, this will be the better option as well.
 /** {@inheritDoc} */
  @Override
  public FsStatus getStatus(Path p) throws IOException {
    statistics.incrementReadOps(1);
    return dfs.getDiskStatus();
  }

This should work for you.

It will give you FileStatus object contains below APIs
getCapacity, getUsed, getRemaining

I would suggest you to look at the FileSystem APIs available once. I think you will get clear understanding to use.

Regards,
Uma


----- Original Message -----
From: Ivan.Novick@emc.com
Date: Monday, October 17, 2011 9:48 pm
Subject: Re: Is there a good way to see how full hdfs is
To: common-user@hadoop.apache.org

> Hi Harsh,
> 
> I need access to the data programatically for system automation, 
> and hence
> I do not want a monitoring tool but access to the raw data.
> 
> I am more than happy to use an exposed function or client program 
> and not
> an internal API.
> 
> So i am still a bit confused... What is the simplest way to get at 
> thisraw disk usage data programmatically?  Is there a HDFS 
> equivalent of du
> and df, or are you suggesting to just run that on the linux OS 
> (which is
> perfectly doable).
> 
> Cheers,
> Ivan
> 
> 
> On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
> 
> >Uma/Ivan,
> >
> >The DistributedFileSystem class explicitly is _not_ meant for public
> >consumption, it is an internal one. Additionally, that method has 
> been>deprecated.
> >
> >What you need is FileSystem#getStatus() if you want the summarized
> >report via code.
> >
> >A job, that possibly runs "du" or "df", is a good idea if you
> >guarantee perfect homogeneity of path names in your cluster.
> >
> >But I wonder, why won't using a general monitoring tool (such as
> >nagios) for this purpose cut it? What's the end goal here?
> >
> >P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
> >see it being cross posted into mr-user, common-user, and common-
> dev --
> >Why?
> >
> >On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
> ><ma...@huawei.com> wrote:
> >> We can write the simple program and you can call this API.
> >>
> >> Make sure Hadoop jars presents in your class path.
> >> Just for more clarification, DN will send their stats as parts of
> >>hertbeats, So, NN will maintain all the statistics about the 
> diskspace>>usage for the complete filesystem and etc... This api 
> will give you that
> >>stats.
> >>
> >> Regards,
> >> Uma
> >>
> >> ----- Original Message -----
> >> From: Ivan.Novick@emc.com
> >> Date: Monday, October 17, 2011 9:07 pm
> >> Subject: Re: Is there a good way to see how full hdfs is
> >> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
> >> Cc: common-dev@hadoop.apache.org
> >>
> >>> So is there a client program to call this?
> >>>
> >>> Can one write their own simple client to call this method from all
> >>> diskson the cluster?
> >>>
> >>> How about a map reduce job to collect from all disks on the 
> cluster?>>>
> >>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
> >>> <ma...@huawei.com>wrote:
> >>>
> >>> >/** Return the disk usage of the filesystem, including total
> >>> capacity,>   * used space, and remaining space */
> >>> >  public DiskStatus getDiskStatus() throws IOException {
> >>> >    return dfs.getDiskStatus();
> >>> >  }
> >>> >
> >>> >DistributedFileSystem has the above API from java API side.
> >>> >
> >>> >Regards,
> >>> >Uma
> >>> >
> >>> >----- Original Message -----
> >>> >From: wd <wd...@wdicc.com>
> >>> >Date: Saturday, October 15, 2011 4:16 pm
> >>> >Subject: Re: Is there a good way to see how full hdfs is
> >>> >To: mapreduce-user@hadoop.apache.org
> >>> >
> >>> >> hadoop dfsadmin -report
> >>> >>
> >>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
> >>> >> <lo...@gmail.com> wrote:
> >>> >> > We have a small cluster with HDFS running on only 8 nodes -
> I
> >>> >> believe that
> >>> >> > the partition assigned to hdfs might be getting full and
> >>> >> > wonder if the web tools or java api havew a way to look at 
> free>>> >> space on
> >>> >> > hdfs
> >>> >> >
> >>> >> > --
> >>> >> > Steven M. Lewis PhD
> >>> >> > 4221 105th Ave NE
> >>> >> > Kirkland, WA 98033
> >>> >> > 206-384-1340 (cell)
> >>> >> > Skype lordjoe_com
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>>
> >>
> >
> >
> >
> >-- 
> >Harsh J
> >
> 
> 

Re: Is there a good way to see how full hdfs is

Posted by Mapred Learn <ma...@gmail.com>.
Hi,
I have same question regarding the documentation and :
Is there something like this for memory and CPU utilization also ?

Sent from my iPhone

Thanks,
JJ

On Oct 19, 2011, at 5:00 PM, Rajiv Chittajallu <ra...@yahoo-inc.com> wrote:

> Ivan.Novick@emc.com wrote on 10/18/11 at 09:23:50 -0700:
>> Cool, is there any documentation on how to use the JMX stuff to get
>> monitoring data?
> 
> I don't know if there is any specific documentation. These are the
> mbeans you might be interested in
> 
> Namenode:
> 
> Hadoop:service=NameNode,name=FSNamesystemState
> Hadoop:service=NameNode,name=NameNodeInfo
> Hadoop:service=NameNode,name=jvm
> 
> JobTracker:
> 
> Hadoop:service=JobTracker,name=JobTrackerInfo
> Hadoop:service=JobTracker,name=QueueMetrics,q=<queuename>
> Hadoop:service=JobTracker,name=jvm
> 
> DataNode:
> Hadoop:name=DataNodeInfo,service=DataNode
> 
> TaskTracker:
> Hadoop:service=TaskTracker,name=TaskTrackerInfo
> 
> You may also want to monitor shuffle_exceptions_caught in 
> Hadoop:service=TaskTracker,name=ShuffleServerMetrics 
> 
>> 
>> Cheers,
>> Ivan
>> 
>> On 10/17/11 6:04 PM, "Rajiv Chittajallu" <ra...@yahoo-inc.com> wrote:
>> 
>>> If you are running > 0.20.204
>>> http://phanpy-nn1.hadoop.apache.org:50070/jmx?qry=Hadoop:service=NameNode,
>>> name=NameNodeInfo
>>> 
>>> 
>>> Ivan.Novick@emc.com wrote on 10/17/11 at 09:18:20 -0700:
>>>> Hi Harsh,
>>>> 
>>>> I need access to the data programatically for system automation, and
>>>> hence
>>>> I do not want a monitoring tool but access to the raw data.
>>>> 
>>>> I am more than happy to use an exposed function or client program and not
>>>> an internal API.
>>>> 
>>>> So i am still a bit confused... What is the simplest way to get at this
>>>> raw disk usage data programmatically?  Is there a HDFS equivalent of du
>>>> and df, or are you suggesting to just run that on the linux OS (which is
>>>> perfectly doable).
>>>> 
>>>> Cheers,
>>>> Ivan
>>>> 
>>>> 
>>>> On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>> 
>>>>> Uma/Ivan,
>>>>> 
>>>>> The DistributedFileSystem class explicitly is _not_ meant for public
>>>>> consumption, it is an internal one. Additionally, that method has been
>>>>> deprecated.
>>>>> 
>>>>> What you need is FileSystem#getStatus() if you want the summarized
>>>>> report via code.
>>>>> 
>>>>> A job, that possibly runs "du" or "df", is a good idea if you
>>>>> guarantee perfect homogeneity of path names in your cluster.
>>>>> 
>>>>> But I wonder, why won't using a general monitoring tool (such as
>>>>> nagios) for this purpose cut it? What's the end goal here?
>>>>> 
>>>>> P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
>>>>> see it being cross posted into mr-user, common-user, and common-dev --
>>>>> Why?
>>>>> 
>>>>> On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
>>>>> <ma...@huawei.com> wrote:
>>>>>> We can write the simple program and you can call this API.
>>>>>> 
>>>>>> Make sure Hadoop jars presents in your class path.
>>>>>> Just for more clarification, DN will send their stats as parts of
>>>>>> hertbeats, So, NN will maintain all the statistics about the diskspace
>>>>>> usage for the complete filesystem and etc... This api will give you
>>>>>> that
>>>>>> stats.
>>>>>> 
>>>>>> Regards,
>>>>>> Uma
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> From: Ivan.Novick@emc.com
>>>>>> Date: Monday, October 17, 2011 9:07 pm
>>>>>> Subject: Re: Is there a good way to see how full hdfs is
>>>>>> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
>>>>>> Cc: common-dev@hadoop.apache.org
>>>>>> 
>>>>>>> So is there a client program to call this?
>>>>>>> 
>>>>>>> Can one write their own simple client to call this method from all
>>>>>>> diskson the cluster?
>>>>>>> 
>>>>>>> How about a map reduce job to collect from all disks on the cluster?
>>>>>>> 
>>>>>>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>>>>>>> <ma...@huawei.com>wrote:
>>>>>>> 
>>>>>>>> /** Return the disk usage of the filesystem, including total
>>>>>>> capacity,>   * used space, and remaining space */
>>>>>>>> public DiskStatus getDiskStatus() throws IOException {
>>>>>>>>   return dfs.getDiskStatus();
>>>>>>>> }
>>>>>>>> 
>>>>>>>> DistributedFileSystem has the above API from java API side.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Uma
>>>>>>>> 
>>>>>>>> ----- Original Message -----
>>>>>>>> From: wd <wd...@wdicc.com>
>>>>>>>> Date: Saturday, October 15, 2011 4:16 pm
>>>>>>>> Subject: Re: Is there a good way to see how full hdfs is
>>>>>>>> To: mapreduce-user@hadoop.apache.org
>>>>>>>> 
>>>>>>>>> hadoop dfsadmin -report
>>>>>>>>> 
>>>>>>>>> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>>>>>>>>> <lo...@gmail.com> wrote:
>>>>>>>>>> We have a small cluster with HDFS running on only 8 nodes - I
>>>>>>>>> believe that
>>>>>>>>>> the partition assigned to hdfs might be getting full and
>>>>>>>>>> wonder if the web tools or java api havew a way to look at free
>>>>>>>>> space on
>>>>>>>>>> hdfs
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Steven M. Lewis PhD
>>>>>>>>>> 4221 105th Ave NE
>>>>>>>>>> Kirkland, WA 98033
>>>>>>>>>> 206-384-1340 (cell)
>>>>>>>>>> Skype lordjoe_com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Harsh J
>>>>> 
>>>> 
>> 

Re: Is there a good way to see how full hdfs is

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.
Ivan.Novick@emc.com wrote on 10/18/11 at 09:23:50 -0700:
>Cool, is there any documentation on how to use the JMX stuff to get
>monitoring data?

I don't know if there is any specific documentation. These are the
mbeans you might be interested in

Namenode:

Hadoop:service=NameNode,name=FSNamesystemState
Hadoop:service=NameNode,name=NameNodeInfo
Hadoop:service=NameNode,name=jvm

JobTracker:

Hadoop:service=JobTracker,name=JobTrackerInfo
Hadoop:service=JobTracker,name=QueueMetrics,q=<queuename>
Hadoop:service=JobTracker,name=jvm

DataNode:
Hadoop:name=DataNodeInfo,service=DataNode

TaskTracker:
Hadoop:service=TaskTracker,name=TaskTrackerInfo

You may also want to monitor shuffle_exceptions_caught in 
Hadoop:service=TaskTracker,name=ShuffleServerMetrics 

>
>Cheers,
>Ivan
>
>On 10/17/11 6:04 PM, "Rajiv Chittajallu" <ra...@yahoo-inc.com> wrote:
>
>>If you are running > 0.20.204
>>http://phanpy-nn1.hadoop.apache.org:50070/jmx?qry=Hadoop:service=NameNode,
>>name=NameNodeInfo
>>
>>
>>Ivan.Novick@emc.com wrote on 10/17/11 at 09:18:20 -0700:
>>>Hi Harsh,
>>>
>>>I need access to the data programatically for system automation, and
>>>hence
>>>I do not want a monitoring tool but access to the raw data.
>>>
>>>I am more than happy to use an exposed function or client program and not
>>>an internal API.
>>>
>>>So i am still a bit confused... What is the simplest way to get at this
>>>raw disk usage data programmatically?  Is there a HDFS equivalent of du
>>>and df, or are you suggesting to just run that on the linux OS (which is
>>>perfectly doable).
>>>
>>>Cheers,
>>>Ivan
>>>
>>>
>>>On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>>>Uma/Ivan,
>>>>
>>>>The DistributedFileSystem class explicitly is _not_ meant for public
>>>>consumption, it is an internal one. Additionally, that method has been
>>>>deprecated.
>>>>
>>>>What you need is FileSystem#getStatus() if you want the summarized
>>>>report via code.
>>>>
>>>>A job, that possibly runs "du" or "df", is a good idea if you
>>>>guarantee perfect homogeneity of path names in your cluster.
>>>>
>>>>But I wonder, why won't using a general monitoring tool (such as
>>>>nagios) for this purpose cut it? What's the end goal here?
>>>>
>>>>P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
>>>>see it being cross posted into mr-user, common-user, and common-dev --
>>>>Why?
>>>>
>>>>On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
>>>><ma...@huawei.com> wrote:
>>>>> We can write the simple program and you can call this API.
>>>>>
>>>>> Make sure Hadoop jars presents in your class path.
>>>>> Just for more clarification, DN will send their stats as parts of
>>>>>hertbeats, So, NN will maintain all the statistics about the diskspace
>>>>>usage for the complete filesystem and etc... This api will give you
>>>>>that
>>>>>stats.
>>>>>
>>>>> Regards,
>>>>> Uma
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Ivan.Novick@emc.com
>>>>> Date: Monday, October 17, 2011 9:07 pm
>>>>> Subject: Re: Is there a good way to see how full hdfs is
>>>>> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
>>>>> Cc: common-dev@hadoop.apache.org
>>>>>
>>>>>> So is there a client program to call this?
>>>>>>
>>>>>> Can one write their own simple client to call this method from all
>>>>>> diskson the cluster?
>>>>>>
>>>>>> How about a map reduce job to collect from all disks on the cluster?
>>>>>>
>>>>>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>>>>>> <ma...@huawei.com>wrote:
>>>>>>
>>>>>> >/** Return the disk usage of the filesystem, including total
>>>>>> capacity,>   * used space, and remaining space */
>>>>>> >  public DiskStatus getDiskStatus() throws IOException {
>>>>>> >    return dfs.getDiskStatus();
>>>>>> >  }
>>>>>> >
>>>>>> >DistributedFileSystem has the above API from java API side.
>>>>>> >
>>>>>> >Regards,
>>>>>> >Uma
>>>>>> >
>>>>>> >----- Original Message -----
>>>>>> >From: wd <wd...@wdicc.com>
>>>>>> >Date: Saturday, October 15, 2011 4:16 pm
>>>>>> >Subject: Re: Is there a good way to see how full hdfs is
>>>>>> >To: mapreduce-user@hadoop.apache.org
>>>>>> >
>>>>>> >> hadoop dfsadmin -report
>>>>>> >>
>>>>>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>>>>>> >> <lo...@gmail.com> wrote:
>>>>>> >> > We have a small cluster with HDFS running on only 8 nodes - I
>>>>>> >> believe that
>>>>>> >> > the partition assigned to hdfs might be getting full and
>>>>>> >> > wonder if the web tools or java api havew a way to look at free
>>>>>> >> space on
>>>>>> >> > hdfs
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> > Steven M. Lewis PhD
>>>>>> >> > 4221 105th Ave NE
>>>>>> >> > Kirkland, WA 98033
>>>>>> >> > 206-384-1340 (cell)
>>>>>> >> > Skype lordjoe_com
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>Harsh J
>>>>
>>>
>

Re: Is there a good way to see how full hdfs is

Posted by Iv...@emc.com.
Cool, is there any documentation on how to use the JMX stuff to get
monitoring data?

Cheers,
Ivan

On 10/17/11 6:04 PM, "Rajiv Chittajallu" <ra...@yahoo-inc.com> wrote:

>If you are running > 0.20.204
>http://phanpy-nn1.hadoop.apache.org:50070/jmx?qry=Hadoop:service=NameNode,
>name=NameNodeInfo
>
>
>Ivan.Novick@emc.com wrote on 10/17/11 at 09:18:20 -0700:
>>Hi Harsh,
>>
>>I need access to the data programatically for system automation, and
>>hence
>>I do not want a monitoring tool but access to the raw data.
>>
>>I am more than happy to use an exposed function or client program and not
>>an internal API.
>>
>>So i am still a bit confused... What is the simplest way to get at this
>>raw disk usage data programmatically?  Is there a HDFS equivalent of du
>>and df, or are you suggesting to just run that on the linux OS (which is
>>perfectly doable).
>>
>>Cheers,
>>Ivan
>>
>>
>>On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>>>Uma/Ivan,
>>>
>>>The DistributedFileSystem class explicitly is _not_ meant for public
>>>consumption, it is an internal one. Additionally, that method has been
>>>deprecated.
>>>
>>>What you need is FileSystem#getStatus() if you want the summarized
>>>report via code.
>>>
>>>A job, that possibly runs "du" or "df", is a good idea if you
>>>guarantee perfect homogeneity of path names in your cluster.
>>>
>>>But I wonder, why won't using a general monitoring tool (such as
>>>nagios) for this purpose cut it? What's the end goal here?
>>>
>>>P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
>>>see it being cross posted into mr-user, common-user, and common-dev --
>>>Why?
>>>
>>>On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
>>><ma...@huawei.com> wrote:
>>>> We can write the simple program and you can call this API.
>>>>
>>>> Make sure Hadoop jars presents in your class path.
>>>> Just for more clarification, DN will send their stats as parts of
>>>>hertbeats, So, NN will maintain all the statistics about the diskspace
>>>>usage for the complete filesystem and etc... This api will give you
>>>>that
>>>>stats.
>>>>
>>>> Regards,
>>>> Uma
>>>>
>>>> ----- Original Message -----
>>>> From: Ivan.Novick@emc.com
>>>> Date: Monday, October 17, 2011 9:07 pm
>>>> Subject: Re: Is there a good way to see how full hdfs is
>>>> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
>>>> Cc: common-dev@hadoop.apache.org
>>>>
>>>>> So is there a client program to call this?
>>>>>
>>>>> Can one write their own simple client to call this method from all
>>>>> diskson the cluster?
>>>>>
>>>>> How about a map reduce job to collect from all disks on the cluster?
>>>>>
>>>>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>>>>> <ma...@huawei.com>wrote:
>>>>>
>>>>> >/** Return the disk usage of the filesystem, including total
>>>>> capacity,>   * used space, and remaining space */
>>>>> >  public DiskStatus getDiskStatus() throws IOException {
>>>>> >    return dfs.getDiskStatus();
>>>>> >  }
>>>>> >
>>>>> >DistributedFileSystem has the above API from java API side.
>>>>> >
>>>>> >Regards,
>>>>> >Uma
>>>>> >
>>>>> >----- Original Message -----
>>>>> >From: wd <wd...@wdicc.com>
>>>>> >Date: Saturday, October 15, 2011 4:16 pm
>>>>> >Subject: Re: Is there a good way to see how full hdfs is
>>>>> >To: mapreduce-user@hadoop.apache.org
>>>>> >
>>>>> >> hadoop dfsadmin -report
>>>>> >>
>>>>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>>>>> >> <lo...@gmail.com> wrote:
>>>>> >> > We have a small cluster with HDFS running on only 8 nodes - I
>>>>> >> believe that
>>>>> >> > the partition assigned to hdfs might be getting full and
>>>>> >> > wonder if the web tools or java api havew a way to look at free
>>>>> >> space on
>>>>> >> > hdfs
>>>>> >> >
>>>>> >> > --
>>>>> >> > Steven M. Lewis PhD
>>>>> >> > 4221 105th Ave NE
>>>>> >> > Kirkland, WA 98033
>>>>> >> > 206-384-1340 (cell)
>>>>> >> > Skype lordjoe_com
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>-- 
>>>Harsh J
>>>
>>


Re: Is there a good way to see how full hdfs is

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.
If you are running > 0.20.204 
http://phanpy-nn1.hadoop.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo


Ivan.Novick@emc.com wrote on 10/17/11 at 09:18:20 -0700:
>Hi Harsh,
>
>I need access to the data programatically for system automation, and hence
>I do not want a monitoring tool but access to the raw data.
>
>I am more than happy to use an exposed function or client program and not
>an internal API.
>
>So i am still a bit confused... What is the simplest way to get at this
>raw disk usage data programmatically?  Is there a HDFS equivalent of du
>and df, or are you suggesting to just run that on the linux OS (which is
>perfectly doable).
>
>Cheers,
>Ivan
>
>
>On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>Uma/Ivan,
>>
>>The DistributedFileSystem class explicitly is _not_ meant for public
>>consumption, it is an internal one. Additionally, that method has been
>>deprecated.
>>
>>What you need is FileSystem#getStatus() if you want the summarized
>>report via code.
>>
>>A job, that possibly runs "du" or "df", is a good idea if you
>>guarantee perfect homogeneity of path names in your cluster.
>>
>>But I wonder, why won't using a general monitoring tool (such as
>>nagios) for this purpose cut it? What's the end goal here?
>>
>>P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
>>see it being cross posted into mr-user, common-user, and common-dev --
>>Why?
>>
>>On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
>><ma...@huawei.com> wrote:
>>> We can write the simple program and you can call this API.
>>>
>>> Make sure Hadoop jars presents in your class path.
>>> Just for more clarification, DN will send their stats as parts of
>>>hertbeats, So, NN will maintain all the statistics about the diskspace
>>>usage for the complete filesystem and etc... This api will give you that
>>>stats.
>>>
>>> Regards,
>>> Uma
>>>
>>> ----- Original Message -----
>>> From: Ivan.Novick@emc.com
>>> Date: Monday, October 17, 2011 9:07 pm
>>> Subject: Re: Is there a good way to see how full hdfs is
>>> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
>>> Cc: common-dev@hadoop.apache.org
>>>
>>>> So is there a client program to call this?
>>>>
>>>> Can one write their own simple client to call this method from all
>>>> diskson the cluster?
>>>>
>>>> How about a map reduce job to collect from all disks on the cluster?
>>>>
>>>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>>>> <ma...@huawei.com>wrote:
>>>>
>>>> >/** Return the disk usage of the filesystem, including total
>>>> capacity,>   * used space, and remaining space */
>>>> >  public DiskStatus getDiskStatus() throws IOException {
>>>> >    return dfs.getDiskStatus();
>>>> >  }
>>>> >
>>>> >DistributedFileSystem has the above API from java API side.
>>>> >
>>>> >Regards,
>>>> >Uma
>>>> >
>>>> >----- Original Message -----
>>>> >From: wd <wd...@wdicc.com>
>>>> >Date: Saturday, October 15, 2011 4:16 pm
>>>> >Subject: Re: Is there a good way to see how full hdfs is
>>>> >To: mapreduce-user@hadoop.apache.org
>>>> >
>>>> >> hadoop dfsadmin -report
>>>> >>
>>>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>>>> >> <lo...@gmail.com> wrote:
>>>> >> > We have a small cluster with HDFS running on only 8 nodes - I
>>>> >> believe that
>>>> >> > the partition assigned to hdfs might be getting full and
>>>> >> > wonder if the web tools or java api havew a way to look at free
>>>> >> space on
>>>> >> > hdfs
>>>> >> >
>>>> >> > --
>>>> >> > Steven M. Lewis PhD
>>>> >> > 4221 105th Ave NE
>>>> >> > Kirkland, WA 98033
>>>> >> > 206-384-1340 (cell)
>>>> >> > Skype lordjoe_com
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>>
>>>
>>
>>
>>
>>-- 
>>Harsh J
>>
>

Re: Is there a good way to see how full hdfs is

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
Yes, that was deprecated in trunk

If you want to use by programatically, this will be the better option as well.
 /** {@inheritDoc} */
  @Override
  public FsStatus getStatus(Path p) throws IOException {
    statistics.incrementReadOps(1);
    return dfs.getDiskStatus();
  }

This should work for you.

It will give you FileStatus object contains below APIs
getCapacity, getUsed, getRemaining

I would suggest you to look at the FileSystem APIs available once. I think you will get clear understanding to use.

Regards,
Uma


----- Original Message -----
From: Ivan.Novick@emc.com
Date: Monday, October 17, 2011 9:48 pm
Subject: Re: Is there a good way to see how full hdfs is
To: common-user@hadoop.apache.org

> Hi Harsh,
> 
> I need access to the data programatically for system automation, 
> and hence
> I do not want a monitoring tool but access to the raw data.
> 
> I am more than happy to use an exposed function or client program 
> and not
> an internal API.
> 
> So i am still a bit confused... What is the simplest way to get at 
> thisraw disk usage data programmatically?  Is there a HDFS 
> equivalent of du
> and df, or are you suggesting to just run that on the linux OS 
> (which is
> perfectly doable).
> 
> Cheers,
> Ivan
> 
> 
> On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:
> 
> >Uma/Ivan,
> >
> >The DistributedFileSystem class explicitly is _not_ meant for public
> >consumption, it is an internal one. Additionally, that method has 
> been>deprecated.
> >
> >What you need is FileSystem#getStatus() if you want the summarized
> >report via code.
> >
> >A job, that possibly runs "du" or "df", is a good idea if you
> >guarantee perfect homogeneity of path names in your cluster.
> >
> >But I wonder, why won't using a general monitoring tool (such as
> >nagios) for this purpose cut it? What's the end goal here?
> >
> >P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
> >see it being cross posted into mr-user, common-user, and common-
> dev --
> >Why?
> >
> >On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
> ><ma...@huawei.com> wrote:
> >> We can write the simple program and you can call this API.
> >>
> >> Make sure Hadoop jars presents in your class path.
> >> Just for more clarification, DN will send their stats as parts of
> >>hertbeats, So, NN will maintain all the statistics about the 
> diskspace>>usage for the complete filesystem and etc... This api 
> will give you that
> >>stats.
> >>
> >> Regards,
> >> Uma
> >>
> >> ----- Original Message -----
> >> From: Ivan.Novick@emc.com
> >> Date: Monday, October 17, 2011 9:07 pm
> >> Subject: Re: Is there a good way to see how full hdfs is
> >> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
> >> Cc: common-dev@hadoop.apache.org
> >>
> >>> So is there a client program to call this?
> >>>
> >>> Can one write their own simple client to call this method from all
> >>> diskson the cluster?
> >>>
> >>> How about a map reduce job to collect from all disks on the 
> cluster?>>>
> >>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
> >>> <ma...@huawei.com>wrote:
> >>>
> >>> >/** Return the disk usage of the filesystem, including total
> >>> capacity,>   * used space, and remaining space */
> >>> >  public DiskStatus getDiskStatus() throws IOException {
> >>> >    return dfs.getDiskStatus();
> >>> >  }
> >>> >
> >>> >DistributedFileSystem has the above API from java API side.
> >>> >
> >>> >Regards,
> >>> >Uma
> >>> >
> >>> >----- Original Message -----
> >>> >From: wd <wd...@wdicc.com>
> >>> >Date: Saturday, October 15, 2011 4:16 pm
> >>> >Subject: Re: Is there a good way to see how full hdfs is
> >>> >To: mapreduce-user@hadoop.apache.org
> >>> >
> >>> >> hadoop dfsadmin -report
> >>> >>
> >>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
> >>> >> <lo...@gmail.com> wrote:
> >>> >> > We have a small cluster with HDFS running on only 8 nodes -
> I
> >>> >> believe that
> >>> >> > the partition assigned to hdfs might be getting full and
> >>> >> > wonder if the web tools or java api havew a way to look at 
> free>>> >> space on
> >>> >> > hdfs
> >>> >> >
> >>> >> > --
> >>> >> > Steven M. Lewis PhD
> >>> >> > 4221 105th Ave NE
> >>> >> > Kirkland, WA 98033
> >>> >> > 206-384-1340 (cell)
> >>> >> > Skype lordjoe_com
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>>
> >>
> >
> >
> >
> >-- 
> >Harsh J
> >
> 
> 

Re: Is there a good way to see how full hdfs is

Posted by Iv...@emc.com.
Hi Harsh,

I need access to the data programatically for system automation, and hence
I do not want a monitoring tool but access to the raw data.

I am more than happy to use an exposed function or client program and not
an internal API.

So i am still a bit confused... What is the simplest way to get at this
raw disk usage data programmatically?  Is there a HDFS equivalent of du
and df, or are you suggesting to just run that on the linux OS (which is
perfectly doable).

Cheers,
Ivan


On 10/17/11 9:05 AM, "Harsh J" <ha...@cloudera.com> wrote:

>Uma/Ivan,
>
>The DistributedFileSystem class explicitly is _not_ meant for public
>consumption, it is an internal one. Additionally, that method has been
>deprecated.
>
>What you need is FileSystem#getStatus() if you want the summarized
>report via code.
>
>A job, that possibly runs "du" or "df", is a good idea if you
>guarantee perfect homogeneity of path names in your cluster.
>
>But I wonder, why won't using a general monitoring tool (such as
>nagios) for this purpose cut it? What's the end goal here?
>
>P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
>see it being cross posted into mr-user, common-user, and common-dev --
>Why?
>
>On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
><ma...@huawei.com> wrote:
>> We can write the simple program and you can call this API.
>>
>> Make sure Hadoop jars presents in your class path.
>> Just for more clarification, DN will send their stats as parts of
>>hertbeats, So, NN will maintain all the statistics about the diskspace
>>usage for the complete filesystem and etc... This api will give you that
>>stats.
>>
>> Regards,
>> Uma
>>
>> ----- Original Message -----
>> From: Ivan.Novick@emc.com
>> Date: Monday, October 17, 2011 9:07 pm
>> Subject: Re: Is there a good way to see how full hdfs is
>> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
>> Cc: common-dev@hadoop.apache.org
>>
>>> So is there a client program to call this?
>>>
>>> Can one write their own simple client to call this method from all
>>> diskson the cluster?
>>>
>>> How about a map reduce job to collect from all disks on the cluster?
>>>
>>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>>> <ma...@huawei.com>wrote:
>>>
>>> >/** Return the disk usage of the filesystem, including total
>>> capacity,>   * used space, and remaining space */
>>> >  public DiskStatus getDiskStatus() throws IOException {
>>> >    return dfs.getDiskStatus();
>>> >  }
>>> >
>>> >DistributedFileSystem has the above API from java API side.
>>> >
>>> >Regards,
>>> >Uma
>>> >
>>> >----- Original Message -----
>>> >From: wd <wd...@wdicc.com>
>>> >Date: Saturday, October 15, 2011 4:16 pm
>>> >Subject: Re: Is there a good way to see how full hdfs is
>>> >To: mapreduce-user@hadoop.apache.org
>>> >
>>> >> hadoop dfsadmin -report
>>> >>
>>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>>> >> <lo...@gmail.com> wrote:
>>> >> > We have a small cluster with HDFS running on only 8 nodes - I
>>> >> believe that
>>> >> > the partition assigned to hdfs might be getting full and
>>> >> > wonder if the web tools or java api havew a way to look at free
>>> >> space on
>>> >> > hdfs
>>> >> >
>>> >> > --
>>> >> > Steven M. Lewis PhD
>>> >> > 4221 105th Ave NE
>>> >> > Kirkland, WA 98033
>>> >> > 206-384-1340 (cell)
>>> >> > Skype lordjoe_com
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> >
>>>
>>>
>>
>
>
>
>-- 
>Harsh J
>


Re: Is there a good way to see how full hdfs is

Posted by Harsh J <ha...@cloudera.com>.
Uma/Ivan,

The DistributedFileSystem class explicitly is _not_ meant for public
consumption, it is an internal one. Additionally, that method has been
deprecated.

What you need is FileSystem#getStatus() if you want the summarized
report via code.

A job, that possibly runs "du" or "df", is a good idea if you
guarantee perfect homogeneity of path names in your cluster.

But I wonder, why won't using a general monitoring tool (such as
nagios) for this purpose cut it? What's the end goal here?

P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I
see it being cross posted into mr-user, common-user, and common-dev --
Why?

On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686
<ma...@huawei.com> wrote:
> We can write the simple program and you can call this API.
>
> Make sure Hadoop jars presents in your class path.
> Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspace usage for the complete filesystem and etc... This api will give you that stats.
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: Ivan.Novick@emc.com
> Date: Monday, October 17, 2011 9:07 pm
> Subject: Re: Is there a good way to see how full hdfs is
> To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org
>
>> So is there a client program to call this?
>>
>> Can one write their own simple client to call this method from all
>> diskson the cluster?
>>
>> How about a map reduce job to collect from all disks on the cluster?
>>
>> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686"
>> <ma...@huawei.com>wrote:
>>
>> >/** Return the disk usage of the filesystem, including total
>> capacity,>   * used space, and remaining space */
>> >  public DiskStatus getDiskStatus() throws IOException {
>> >    return dfs.getDiskStatus();
>> >  }
>> >
>> >DistributedFileSystem has the above API from java API side.
>> >
>> >Regards,
>> >Uma
>> >
>> >----- Original Message -----
>> >From: wd <wd...@wdicc.com>
>> >Date: Saturday, October 15, 2011 4:16 pm
>> >Subject: Re: Is there a good way to see how full hdfs is
>> >To: mapreduce-user@hadoop.apache.org
>> >
>> >> hadoop dfsadmin -report
>> >>
>> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>> >> <lo...@gmail.com> wrote:
>> >> > We have a small cluster with HDFS running on only 8 nodes - I
>> >> believe that
>> >> > the partition assigned to hdfs might be getting full and
>> >> > wonder if the web tools or java api havew a way to look at free
>> >> space on
>> >> > hdfs
>> >> >
>> >> > --
>> >> > Steven M. Lewis PhD
>> >> > 4221 105th Ave NE
>> >> > Kirkland, WA 98033
>> >> > 206-384-1340 (cell)
>> >> > Skype lordjoe_com
>> >> >
>> >> >
>> >> >
>> >>
>> >
>>
>>
>



-- 
Harsh J

Re: Is there a good way to see how full hdfs is

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
We can write the simple program and you can call this API.

Make sure Hadoop jars presents in your class path.
Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspace usage for the complete filesystem and etc... This api will give you that stats.

Regards,
Uma

----- Original Message -----
From: Ivan.Novick@emc.com
Date: Monday, October 17, 2011 9:07 pm
Subject: Re: Is there a good way to see how full hdfs is
To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
Cc: common-dev@hadoop.apache.org

> So is there a client program to call this?
> 
> Can one write their own simple client to call this method from all 
> diskson the cluster?  
> 
> How about a map reduce job to collect from all disks on the cluster?
> 
> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686" 
> <ma...@huawei.com>wrote:
> 
> >/** Return the disk usage of the filesystem, including total 
> capacity,>   * used space, and remaining space */
> >  public DiskStatus getDiskStatus() throws IOException {
> >    return dfs.getDiskStatus();
> >  }
> >
> >DistributedFileSystem has the above API from java API side.
> >
> >Regards,
> >Uma
> >
> >----- Original Message -----
> >From: wd <wd...@wdicc.com>
> >Date: Saturday, October 15, 2011 4:16 pm
> >Subject: Re: Is there a good way to see how full hdfs is
> >To: mapreduce-user@hadoop.apache.org
> >
> >> hadoop dfsadmin -report
> >> 
> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
> >> <lo...@gmail.com> wrote:
> >> > We have a small cluster with HDFS running on only 8 nodes - I
> >> believe that
> >> > the partition assigned to hdfs might be getting full and
> >> > wonder if the web tools or java api havew a way to look at free
> >> space on
> >> > hdfs
> >> >
> >> > --
> >> > Steven M. Lewis PhD
> >> > 4221 105th Ave NE
> >> > Kirkland, WA 98033
> >> > 206-384-1340 (cell)
> >> > Skype lordjoe_com
> >> >
> >> >
> >> >
> >> 
> >
> 
> 

Re: Is there a good way to see how full hdfs is

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
We can write the simple program and you can call this API.

Make sure Hadoop jars presents in your class path.
Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspace usage for the complete filesystem and etc... This api will give you that stats.

Regards,
Uma

----- Original Message -----
From: Ivan.Novick@emc.com
Date: Monday, October 17, 2011 9:07 pm
Subject: Re: Is there a good way to see how full hdfs is
To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
Cc: common-dev@hadoop.apache.org

> So is there a client program to call this?
> 
> Can one write their own simple client to call this method from all 
> diskson the cluster?  
> 
> How about a map reduce job to collect from all disks on the cluster?
> 
> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686" 
> <ma...@huawei.com>wrote:
> 
> >/** Return the disk usage of the filesystem, including total 
> capacity,>   * used space, and remaining space */
> >  public DiskStatus getDiskStatus() throws IOException {
> >    return dfs.getDiskStatus();
> >  }
> >
> >DistributedFileSystem has the above API from java API side.
> >
> >Regards,
> >Uma
> >
> >----- Original Message -----
> >From: wd <wd...@wdicc.com>
> >Date: Saturday, October 15, 2011 4:16 pm
> >Subject: Re: Is there a good way to see how full hdfs is
> >To: mapreduce-user@hadoop.apache.org
> >
> >> hadoop dfsadmin -report
> >> 
> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
> >> <lo...@gmail.com> wrote:
> >> > We have a small cluster with HDFS running on only 8 nodes - I
> >> believe that
> >> > the partition assigned to hdfs might be getting full and
> >> > wonder if the web tools or java api havew a way to look at free
> >> space on
> >> > hdfs
> >> >
> >> > --
> >> > Steven M. Lewis PhD
> >> > 4221 105th Ave NE
> >> > Kirkland, WA 98033
> >> > 206-384-1340 (cell)
> >> > Skype lordjoe_com
> >> >
> >> >
> >> >
> >> 
> >
> 
> 

Re: Is there a good way to see how full hdfs is

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
We can write the simple program and you can call this API.

Make sure Hadoop jars presents in your class path.
Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspace usage for the complete filesystem and etc... This api will give you that stats.

Regards,
Uma

----- Original Message -----
From: Ivan.Novick@emc.com
Date: Monday, October 17, 2011 9:07 pm
Subject: Re: Is there a good way to see how full hdfs is
To: common-user@hadoop.apache.org, mapreduce-user@hadoop.apache.org
Cc: common-dev@hadoop.apache.org

> So is there a client program to call this?
> 
> Can one write their own simple client to call this method from all 
> diskson the cluster?  
> 
> How about a map reduce job to collect from all disks on the cluster?
> 
> On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686" 
> <ma...@huawei.com>wrote:
> 
> >/** Return the disk usage of the filesystem, including total 
> capacity,>   * used space, and remaining space */
> >  public DiskStatus getDiskStatus() throws IOException {
> >    return dfs.getDiskStatus();
> >  }
> >
> >DistributedFileSystem has the above API from java API side.
> >
> >Regards,
> >Uma
> >
> >----- Original Message -----
> >From: wd <wd...@wdicc.com>
> >Date: Saturday, October 15, 2011 4:16 pm
> >Subject: Re: Is there a good way to see how full hdfs is
> >To: mapreduce-user@hadoop.apache.org
> >
> >> hadoop dfsadmin -report
> >> 
> >> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
> >> <lo...@gmail.com> wrote:
> >> > We have a small cluster with HDFS running on only 8 nodes - I
> >> believe that
> >> > the partition assigned to hdfs might be getting full and
> >> > wonder if the web tools or java api havew a way to look at free
> >> space on
> >> > hdfs
> >> >
> >> > --
> >> > Steven M. Lewis PhD
> >> > 4221 105th Ave NE
> >> > Kirkland, WA 98033
> >> > 206-384-1340 (cell)
> >> > Skype lordjoe_com
> >> >
> >> >
> >> >
> >> 
> >
> 
> 

Re: Is there a good way to see how full hdfs is

Posted by Iv...@emc.com.
So is there a client program to call this?

Can one write their own simple client to call this method from all disks
on the cluster?  

How about a map reduce job to collect from all disks on the cluster?

On 10/15/11 4:51 AM, "Uma Maheswara Rao G 72686" <ma...@huawei.com>
wrote:

>/** Return the disk usage of the filesystem, including total capacity,
>   * used space, and remaining space */
>  public DiskStatus getDiskStatus() throws IOException {
>    return dfs.getDiskStatus();
>  }
>
>DistributedFileSystem has the above API from java API side.
>
>Regards,
>Uma
>
>----- Original Message -----
>From: wd <wd...@wdicc.com>
>Date: Saturday, October 15, 2011 4:16 pm
>Subject: Re: Is there a good way to see how full hdfs is
>To: mapreduce-user@hadoop.apache.org
>
>> hadoop dfsadmin -report
>> 
>> On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis
>> <lo...@gmail.com> wrote:
>> > We have a small cluster with HDFS running on only 8 nodes - I
>> believe that
>> > the partition assigned to hdfs might be getting full and
>> > wonder if the web tools or java api havew a way to look at free
>> space on
>> > hdfs
>> >
>> > --
>> > Steven M. Lewis PhD
>> > 4221 105th Ave NE
>> > Kirkland, WA 98033
>> > 206-384-1340 (cell)
>> > Skype lordjoe_com
>> >
>> >
>> >
>> 
>