You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Florian Leibert (JIRA)" <ji...@apache.org> on 2008/10/31 16:57:44 UTC

[jira] Created: (HADOOP-4559) Rest API for retrieving job / task statistics

Rest API for retrieving job / task statistics 
----------------------------------------------

                 Key: HADOOP-4559
                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Florian Leibert
            Priority: Trivial
             Fix For: 0.20.0


a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Paco Nathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650838#action_12650838 ] 

Paco Nathan commented on HADOOP-4559:
-------------------------------------

> Isn't most of this provided through job history?

No, not really. Not if a long-running workflow requires these measurements for automated decisions.

While a human can *read* the job history data from JSP pages, there's no current means for the app code which calls ToolRunner to obtain that data and use it to alter the workflow.

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644689#action_12644689 ] 

Steve Loughran commented on HADOOP-4559:
----------------------------------------

- although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet, or the output redone as <%= %> operations to produce something more JSP-y

- I recommend HtmlUnit as the best extension to JUnit for testing web pages; it could grab the pages and look at the content. 

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Florian Leibert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Florian Leibert updated HADOOP-4559:
------------------------------------

    Attachment: HADOOP-4559.patch

This will provide a very simple api that allows to retrieve statistics about the tasks for a given jobid - such as average, min and max times per task, failed tasks per job, total job runtime, etc. 

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4559:
----------------------------------

    Status: Open  (was: Patch Available)

bq. although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet
+1

* Please format the code according to the [conventions|http://wiki.apache.org/hadoop/HowToContribute#head-59ae13df098fbdcc46abdf980aa8ee76d3ee2e3b].
* There's a fair amount of dead code in this patch, e.g.
{noformat}
+	    StringBuffer sb = new StringBuffer();
+	    boolean isFirst = true;
+	    for (String kv : kv_pairs) {
+	    	
+			sb.append(kv);	
+		}
{noformat}
{{kv_pairs}} is initialized, but empty. {{sb}} is unused, save in this loop. The loop above it doesn't appear to do any productive work. StringBuilder should be used instead of StringBuffer in this context.
* If you're proposing this as a public API, it must at least have a unit test.
* Isn't most of this provided through job history?

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644751#action_12644751 ] 

Hadoop QA commented on HADOOP-4559:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393159/HADOOP-4559.patch
  against trunk revision 709609.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/console

This message is automatically generated.

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Florian Leibert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Florian Leibert updated HADOOP-4559:
------------------------------------

    Attachment: HADOOP-4559v2.patch

the previous version was a bit dirty. I think this one is quite an improvement. We're using it to gather a lot of stats for our job runs. It's not a servlet and doesn' contain HtmlUnit - I think one stats JSP doesn't justify adding another library to the distribution - also for the sake of simplicity this remains a JSP... Hope this is valuable for someone else as well - it really is useful for us to track performance when modifying our algorithm...


> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>         Attachments: HADOOP-4559v2.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Paco Nathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644792#action_12644792 ] 

pacoid edited comment on HADOOP-4559 at 11/3/08 11:50 AM:
---------------------------------------------------------------

HADOOP-4559 provides a workaround for part of the issue described in HADOOP-3850.  Can now access log data by making REST calls to JSP provided in 3850. For example:

   RunningJob currentjob = JobClient.runJob(job_conf);

   JobID id = currentjob.getID();
   String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();

   HttpClient client = new HttpClient();
   HttpMethod method = new GetMethod(url);

   client.executeMethod(method);
   String logData = method.getResponseBodyAsString();
   method.releaseConnection();


      was (Author: pacoid):
    HADOOP-4559 provides a workaround for the issue described in HADOOP-3850.  We can now access the log data by making REST calls to JSP provided in 3850.

   RunningJob currentjob = JobClient.runJob(job_conf);
   String urlPrefix = "http://localhost:50030/api.jsp?info=jobdetails&id=";
        final JobID id = currentjob.getID();
        final int id_int = id.getId();
        final String url = urlPrefix + id_int;
        final String json = getStringFromREST(url);

  
> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Florian Leibert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Florian Leibert updated HADOOP-4559:
------------------------------------

    Attachment:     (was: HADOOP-4559.patch)

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>         Attachments: HADOOP-4559v2.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Florian Leibert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Florian Leibert updated HADOOP-4559:
------------------------------------

    Release Note: adds api features to the webapp part of hadoop allowing to retrieve task stats for a given job
          Status: Patch Available  (was: Open)

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670549#action_12670549 ] 

Steve Loughran commented on HADOOP-4559:
----------------------------------------

+1 to Bill's idea for a RESTy API, one that works long-haul. 

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>         Attachments: HADOOP-4559v2.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4559) Rest API for retrieving job / task statistics

Posted by "Bill de hOra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651447#action_12651447 ] 

Bill de hOra commented on HADOOP-4559:
--------------------------------------



{code}
JobID id = currentjob.getID();
String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();
{code}

Can't you just call this a JSP into the jobtracker instead? I hate to nitpick, but it's not REST style (client url construction), nor is the response (no links), and ASF code should (imvho) know the difference. If you want to be build REST style tooling around the tracker, I'd be happy to help with that. For example to scale this up to a lot of jobs and/or a lot of clients will require something that doesn't hammer the tracker. And iterating over the tracker seems like a linear bottleneck - O(1) key lookup would be much better. 

> Rest API for retrieving job / task statistics 
> ----------------------------------------------
>
>                 Key: HADOOP-4559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4559
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Florian Leibert
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4559.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given job such as:  min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.