You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2009/03/12 02:37:50 UTC

[jira] Created: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Exposing Hadoop metrics via HTTP
--------------------------------

                 Key: HADOOP-5469
                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
             Project: Hadoop Core
          Issue Type: New Feature
          Components: metrics
            Reporter: Philip Zeyliger


I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.

This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.

To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.

I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.

The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.

Here's a sample output for a job tracker, while it was running a "pi" job:

{noformat}
jvm
  metrics
    {hostName=doorstop.local, processName=JobTracker, sessionId=}
      gcCount=22
      gcTimeMillis=68
      logError=0
      logFatal=0
      logInfo=52
      logWarn=0
      memHeapCommittedM=7.4375
      memHeapUsedM=4.2150116
      memNonHeapCommittedM=23.1875
      memNonHeapUsedM=18.438614
      threadsBlocked=0
      threadsNew=0
      threadsRunnable=7
      threadsTerminated=0
      threadsTimedWaiting=8
      threadsWaiting=15
mapred
  job
    {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=2.0
    {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=48.0
    {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=148.0
    {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
    {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=236.0
    {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=64.0
    {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=1.0
    {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
  jobtracker
    {hostName=doorstop.local, sessionId=}
      jobs_completed=0
      jobs_submitted=1
      maps_completed=2
      maps_launched=5
      reduces_completed=0
      reduces_launched=1
rpc
  metrics
    {hostName=doorstop.local, port=50030}
      NumOpenConnections=2
      RpcProcessingTime_avg_time=0
      RpcProcessingTime_num_ops=84
      RpcQueueTime_avg_time=1
      RpcQueueTime_num_ops=84
      callQueueLen=0
      getBuildVersion_avg_time=0
      getBuildVersion_num_ops=1
      getJobProfile_avg_time=0
      getJobProfile_num_ops=17
      getJobStatus_avg_time=0
      getJobStatus_num_ops=32
      getNewJobId_avg_time=0
      getNewJobId_num_ops=1
      getProtocolVersion_avg_time=0
      getProtocolVersion_num_ops=2
      getSystemDir_avg_time=0
      getSystemDir_num_ops=2
      getTaskCompletionEvents_avg_time=0
      getTaskCompletionEvents_num_ops=19
      heartbeat_avg_time=5
      heartbeat_num_ops=9
      submitJob_avg_time=0
      submitJob_num_ops=1
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701597#action_12701597 ] 

Philip Zeyliger commented on HADOOP-5469:
-----------------------------------------

The same way we protect the various status pages, the RPC ports, the sockets that the data nodes will happily send you blocks over?  (Namely, not at all, until Hadoop has a security framework.)

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701594#action_12701594 ] 

Allen Wittenauer commented on HADOOP-5469:
------------------------------------------

So how do we protect this new interface from prying eyes?

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-5469:
-------------------------------------

    Assignee: Philip Zeyliger

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work logged: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#action_10816 ]

Philip Zeyliger logged work on HADOOP-5469:
-------------------------------------------

                Author: Philip Zeyliger
            Created on: 31/Mar/09 01:21 PM
            Start Date: 31/Mar/09 01:18 PM
    Worklog Time Spent: 2h 
      Work Description: I've thrown this up on the Hadoop JIRA.  Now I'm blocked until someone reviews it, which might be Tom.

For my future reference, here's how you run the "test-patch" task on ant.  It takes a long time and spits out esoteric error messages.  (Turns out you need a block of the Apache license at the top of every file; who knew!)
{noformat}
ANT_HOME=/usr/share/ant ant -Dpatch.file=../hadoop-trunk2/HADOOP-5469.patch -Dforrest.home=$HOME/pub/apache-forrest-0.8 -Dfindbugs.home=$HOME/pub/findbugs-1.3.8 -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home -Dscratch.dir=/tmp/philip test-patch
{noformat}

I expect this will take some more time after the review.

Issue Time Tracking
-------------------

            Time Spent: 2h
    Remaining Estimate: 1.5h

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Status: Patch Available  (was: Open)

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701599#action_12701599 ] 

Marco Nicosia commented on HADOOP-5469:
---------------------------------------

Opened HADOOP-5722 to make this a configurable feature. And yes, we continue to lobby for better protection of all of Hadoop's ports. In the meantime, we prefer not to open additional holes if possible.


> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved HADOOP-5469.
----------------------------------

    Resolution: Fixed

Oops.  Forgot to add the new files.  Fixed.

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-5469:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Philip!

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Attachment: HADOOP-5469.patch

Uploading a new patch.

This one fixes the JSON generation, and includes tests for MetricsServlet, as well as the new functionality I added to OutputRecord.

Reviews appreciated!

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Attachment: HADOOP-5469.patch

Attaching patch.  Hudson will complain about missing unit tests, and it will be right.  The patch is small enough that I hope having it around will help discussion.

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.
> This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.
> To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
> common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.
> I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.
> The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.
> Here's a sample output for a job tracker, while it was running a "pi" job:
> {noformat}
> jvm
>   metrics
>     {hostName=doorstop.local, processName=JobTracker, sessionId=}
>       gcCount=22
>       gcTimeMillis=68
>       logError=0
>       logFatal=0
>       logInfo=52
>       logWarn=0
>       memHeapCommittedM=7.4375
>       memHeapUsedM=4.2150116
>       memNonHeapCommittedM=23.1875
>       memNonHeapUsedM=18.438614
>       threadsBlocked=0
>       threadsNew=0
>       threadsRunnable=7
>       threadsTerminated=0
>       threadsTimedWaiting=8
>       threadsWaiting=15
> mapred
>   job
>     {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=2.0
>     {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=48.0
>     {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=148.0
>     {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>     {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=236.0
>     {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=64.0
>     {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=1.0
>     {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>   jobtracker
>     {hostName=doorstop.local, sessionId=}
>       jobs_completed=0
>       jobs_submitted=1
>       maps_completed=2
>       maps_launched=5
>       reduces_completed=0
>       reduces_launched=1
> rpc
>   metrics
>     {hostName=doorstop.local, port=50030}
>       NumOpenConnections=2
>       RpcProcessingTime_avg_time=0
>       RpcProcessingTime_num_ops=84
>       RpcQueueTime_avg_time=1
>       RpcQueueTime_num_ops=84
>       callQueueLen=0
>       getBuildVersion_avg_time=0
>       getBuildVersion_num_ops=1
>       getJobProfile_avg_time=0
>       getJobProfile_num_ops=17
>       getJobStatus_avg_time=0
>       getJobStatus_num_ops=32
>       getNewJobId_avg_time=0
>       getNewJobId_num_ops=1
>       getProtocolVersion_avg_time=0
>       getProtocolVersion_num_ops=2
>       getSystemDir_avg_time=0
>       getSystemDir_num_ops=2
>       getTaskCompletionEvents_avg_time=0
>       getTaskCompletionEvents_num_ops=19
>       heartbeat_avg_time=5
>       heartbeat_num_ops=9
>       submitJob_avg_time=0
>       submitJob_num_ops=1
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694354#action_12694354 ] 

Philip Zeyliger commented on HADOOP-5469:
-----------------------------------------

The same two tests seem to be failing in trunk, according to http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/792/testReport/ .  The tests are as follows, and don't relate to this patch.

* org.apache.hadoop.mapred.TestCapacityScheduler.testHighMemoryJobWithInvalidRequirements
* org.apache.hadoop.mapred.TestCapacityScheduler.testClusterBlockingForLackOfMemory

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694353#action_12694353 ] 

Hadoop QA commented on HADOOP-5469:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404254/HADOOP-5469.patch
  against trunk revision 760651.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/85/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/85/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/85/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/85/console

This message is automatically generated.

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Description: Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.  (was: I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.

This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.

To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.

I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.

The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.

Here's a sample output for a job tracker, while it was running a "pi" job:

{noformat}
jvm
  metrics
    {hostName=doorstop.local, processName=JobTracker, sessionId=}
      gcCount=22
      gcTimeMillis=68
      logError=0
      logFatal=0
      logInfo=52
      logWarn=0
      memHeapCommittedM=7.4375
      memHeapUsedM=4.2150116
      memNonHeapCommittedM=23.1875
      memNonHeapUsedM=18.438614
      threadsBlocked=0
      threadsNew=0
      threadsRunnable=7
      threadsTerminated=0
      threadsTimedWaiting=8
      threadsWaiting=15
mapred
  job
    {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=2.0
    {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=48.0
    {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=148.0
    {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
    {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=236.0
    {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=64.0
    {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=1.0
    {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
  jobtracker
    {hostName=doorstop.local, sessionId=}
      jobs_completed=0
      jobs_submitted=1
      maps_completed=2
      maps_launched=5
      reduces_completed=0
      reduces_launched=1
rpc
  metrics
    {hostName=doorstop.local, port=50030}
      NumOpenConnections=2
      RpcProcessingTime_avg_time=0
      RpcProcessingTime_num_ops=84
      RpcQueueTime_avg_time=1
      RpcQueueTime_num_ops=84
      callQueueLen=0
      getBuildVersion_avg_time=0
      getBuildVersion_num_ops=1
      getJobProfile_avg_time=0
      getJobProfile_num_ops=17
      getJobStatus_avg_time=0
      getJobStatus_num_ops=32
      getNewJobId_avg_time=0
      getNewJobId_num_ops=1
      getProtocolVersion_avg_time=0
      getProtocolVersion_num_ops=2
      getSystemDir_avg_time=0
      getSystemDir_num_ops=2
      getTaskCompletionEvents_avg_time=0
      getTaskCompletionEvents_num_ops=19
      heartbeat_avg_time=5
      heartbeat_num_ops=9
      submitJob_avg_time=0
      submitJob_num_ops=1
{noformat})

I've been schooled that descriptions ought to be short, and comments lengthy.  The original description follows, and the description has been shortened.


I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.

This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.

To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.

I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.

The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.

Here's a sample output for a job tracker, while it was running a "pi" job:

{noformat}
jvm
  metrics
    {hostName=doorstop.local, processName=JobTracker, sessionId=}
      gcCount=22
      gcTimeMillis=68
      logError=0
      logFatal=0
      logInfo=52
      logWarn=0
      memHeapCommittedM=7.4375
      memHeapUsedM=4.2150116
      memNonHeapCommittedM=23.1875
      memNonHeapUsedM=18.438614
      threadsBlocked=0
      threadsNew=0
      threadsRunnable=7
      threadsTerminated=0
      threadsTimedWaiting=8
      threadsWaiting=15
mapred
  job
    {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=2.0
    {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=48.0
    {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=148.0
    {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
    {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=236.0
    {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=64.0
    {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=1.0
    {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
  jobtracker
    {hostName=doorstop.local, sessionId=}
      jobs_completed=0
      jobs_submitted=1
      maps_completed=2
      maps_launched=5
      reduces_completed=0
      reduces_launched=1
rpc
  metrics
    {hostName=doorstop.local, port=50030}
      NumOpenConnections=2
      RpcProcessingTime_avg_time=0
      RpcProcessingTime_num_ops=84
      RpcQueueTime_avg_time=1
      RpcQueueTime_num_ops=84
      callQueueLen=0
      getBuildVersion_avg_time=0
      getBuildVersion_num_ops=1
      getJobProfile_avg_time=0
      getJobProfile_num_ops=17
      getJobStatus_avg_time=0
      getJobStatus_num_ops=32
      getNewJobId_avg_time=0
      getNewJobId_num_ops=1
      getProtocolVersion_avg_time=0
      getProtocolVersion_num_ops=2
      getSystemDir_avg_time=0
      getSystemDir_num_ops=2
      getTaskCompletionEvents_avg_time=0
      getTaskCompletionEvents_num_ops=19
      heartbeat_avg_time=5
      heartbeat_num_ops=9
      submitJob_avg_time=0
      submitJob_num_ops=1
{noformat}

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681668#action_12681668 ] 

Steve Loughran commented on HADOOP-5469:
----------------------------------------

HtmlUnit would be the JAR to use to write tests for this

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.
> This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.
> To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
> common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.
> I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.
> The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.
> Here's a sample output for a job tracker, while it was running a "pi" job:
> {noformat}
> jvm
>   metrics
>     {hostName=doorstop.local, processName=JobTracker, sessionId=}
>       gcCount=22
>       gcTimeMillis=68
>       logError=0
>       logFatal=0
>       logInfo=52
>       logWarn=0
>       memHeapCommittedM=7.4375
>       memHeapUsedM=4.2150116
>       memNonHeapCommittedM=23.1875
>       memNonHeapUsedM=18.438614
>       threadsBlocked=0
>       threadsNew=0
>       threadsRunnable=7
>       threadsTerminated=0
>       threadsTimedWaiting=8
>       threadsWaiting=15
> mapred
>   job
>     {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=2.0
>     {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=48.0
>     {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=148.0
>     {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>     {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=236.0
>     {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=64.0
>     {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=1.0
>     {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>   jobtracker
>     {hostName=doorstop.local, sessionId=}
>       jobs_completed=0
>       jobs_submitted=1
>       maps_completed=2
>       maps_launched=5
>       reduces_completed=0
>       reduces_launched=1
> rpc
>   metrics
>     {hostName=doorstop.local, port=50030}
>       NumOpenConnections=2
>       RpcProcessingTime_avg_time=0
>       RpcProcessingTime_num_ops=84
>       RpcQueueTime_avg_time=1
>       RpcQueueTime_num_ops=84
>       callQueueLen=0
>       getBuildVersion_avg_time=0
>       getBuildVersion_num_ops=1
>       getJobProfile_avg_time=0
>       getJobProfile_num_ops=17
>       getJobStatus_avg_time=0
>       getJobStatus_num_ops=32
>       getNewJobId_avg_time=0
>       getNewJobId_num_ops=1
>       getProtocolVersion_avg_time=0
>       getProtocolVersion_num_ops=2
>       getSystemDir_avg_time=0
>       getSystemDir_num_ops=2
>       getTaskCompletionEvents_avg_time=0
>       getTaskCompletionEvents_num_ops=19
>       heartbeat_avg_time=5
>       heartbeat_num_ops=9
>       submitJob_avg_time=0
>       submitJob_num_ops=1
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696777#action_12696777 ] 

Chris Douglas commented on HADOOP-5469:
---------------------------------------

The changes to src/saveVersion.sh and VersionInfo seem unrelated to this issue...

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Status: Open  (was: Patch Available)

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681549#action_12681549 ] 

Hadoop QA commented on HADOOP-5469:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12402000/HADOOP-5469.patch
  against trunk revision 752984.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 release audit.  The applied patch generated 647 release audit warnings (more than the trunk's current 645 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/78/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/78/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/78/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/78/console

This message is automatically generated.

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.
> This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.
> To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
> common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.
> I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.
> The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.
> Here's a sample output for a job tracker, while it was running a "pi" job:
> {noformat}
> jvm
>   metrics
>     {hostName=doorstop.local, processName=JobTracker, sessionId=}
>       gcCount=22
>       gcTimeMillis=68
>       logError=0
>       logFatal=0
>       logInfo=52
>       logWarn=0
>       memHeapCommittedM=7.4375
>       memHeapUsedM=4.2150116
>       memNonHeapCommittedM=23.1875
>       memNonHeapUsedM=18.438614
>       threadsBlocked=0
>       threadsNew=0
>       threadsRunnable=7
>       threadsTerminated=0
>       threadsTimedWaiting=8
>       threadsWaiting=15
> mapred
>   job
>     {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=2.0
>     {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=48.0
>     {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=148.0
>     {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>     {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=236.0
>     {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=64.0
>     {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=1.0
>     {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>   jobtracker
>     {hostName=doorstop.local, sessionId=}
>       jobs_completed=0
>       jobs_submitted=1
>       maps_completed=2
>       maps_launched=5
>       reduces_completed=0
>       reduces_launched=1
> rpc
>   metrics
>     {hostName=doorstop.local, port=50030}
>       NumOpenConnections=2
>       RpcProcessingTime_avg_time=0
>       RpcProcessingTime_num_ops=84
>       RpcQueueTime_avg_time=1
>       RpcQueueTime_num_ops=84
>       callQueueLen=0
>       getBuildVersion_avg_time=0
>       getBuildVersion_num_ops=1
>       getJobProfile_avg_time=0
>       getJobProfile_num_ops=17
>       getJobStatus_avg_time=0
>       getJobStatus_num_ops=32
>       getNewJobId_avg_time=0
>       getNewJobId_num_ops=1
>       getProtocolVersion_avg_time=0
>       getProtocolVersion_num_ops=2
>       getSystemDir_avg_time=0
>       getSystemDir_num_ops=2
>       getTaskCompletionEvents_avg_time=0
>       getTaskCompletionEvents_num_ops=19
>       heartbeat_avg_time=5
>       heartbeat_num_ops=9
>       submitJob_avg_time=0
>       submitJob_num_ops=1
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Status: Patch Available  (was: Open)

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility.  I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.
> This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.
> To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing some
> common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.
> I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's some gross type-handling.
> The patch is missing tests.  I wanted to post to gather feedback before I got too far, but tests are forthcoming.
> Here's a sample output for a job tracker, while it was running a "pi" job:
> {noformat}
> jvm
>   metrics
>     {hostName=doorstop.local, processName=JobTracker, sessionId=}
>       gcCount=22
>       gcTimeMillis=68
>       logError=0
>       logFatal=0
>       logInfo=52
>       logWarn=0
>       memHeapCommittedM=7.4375
>       memHeapUsedM=4.2150116
>       memNonHeapCommittedM=23.1875
>       memNonHeapUsedM=18.438614
>       threadsBlocked=0
>       threadsNew=0
>       threadsRunnable=7
>       threadsTerminated=0
>       threadsTimedWaiting=8
>       threadsWaiting=15
> mapred
>   job
>     {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=2.0
>     {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=48.0
>     {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=148.0
>     {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>     {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=236.0
>     {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=64.0
>     {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=1.0
>     {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=4.0
>     {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
>       value=0.0
>   jobtracker
>     {hostName=doorstop.local, sessionId=}
>       jobs_completed=0
>       jobs_submitted=1
>       maps_completed=2
>       maps_launched=5
>       reduces_completed=0
>       reduces_launched=1
> rpc
>   metrics
>     {hostName=doorstop.local, port=50030}
>       NumOpenConnections=2
>       RpcProcessingTime_avg_time=0
>       RpcProcessingTime_num_ops=84
>       RpcQueueTime_avg_time=1
>       RpcQueueTime_num_ops=84
>       callQueueLen=0
>       getBuildVersion_avg_time=0
>       getBuildVersion_num_ops=1
>       getJobProfile_avg_time=0
>       getJobProfile_num_ops=17
>       getJobStatus_avg_time=0
>       getJobStatus_num_ops=32
>       getNewJobId_avg_time=0
>       getNewJobId_num_ops=1
>       getProtocolVersion_avg_time=0
>       getProtocolVersion_num_ops=2
>       getSystemDir_avg_time=0
>       getSystemDir_num_ops=2
>       getTaskCompletionEvents_avg_time=0
>       getTaskCompletionEvents_num_ops=19
>       heartbeat_avg_time=5
>       heartbeat_num_ops=9
>       submitJob_avg_time=0
>       submitJob_num_ops=1
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-5469) Exposing Hadoop metrics via HTTP

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas reopened HADOOP-5469:
-----------------------------------


I reverted this because trunk no longer compiled

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5469.patch, HADOOP-5469.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 1.5h
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.