You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Michael Bieniosek (JIRA)" <ji...@apache.org> on 2007/05/02 01:28:15 UTC

[jira] Created: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

JobInProgress should be public (or implement a public interface)
----------------------------------------------------------------

                 Key: HADOOP-1313
                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Michael Bieniosek


I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.

I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  

The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  

So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).

Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492993 ] 

Michael Bieniosek commented on HADOOP-1313:
-------------------------------------------

The webapps use the JobTracker, so I assumed I would be able to use its interface as well.  It seems odd that the webapp is privileged.  

JobClient and JobSubmissionProtocol don't provide the actual number of running/completed tasks; they only seem to provide the progress as a float.  I also want ClusterStatus.  ClusterStatus is available from JobSubmissionProtocol as well as JobTracker, but JobSubmissionProtocol is not available from JobClient.


> JobInProgress should be public (or implement a public interface)
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Michael Bieniosek
>
> I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.
> I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  
> The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  
> So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek resolved HADOOP-1313.
---------------------------------------

    Resolution: Invalid

> JobInProgress should be public (or implement a public interface)
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Michael Bieniosek
>
> I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.
> I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  
> The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  
> So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492995 ] 

Michael Bieniosek commented on HADOOP-1313:
-------------------------------------------

Sorry, I'm wrong.  I can get ClusterStatus from JobClient.

So the only thing I care about is the actual number of running/completed tasks, which isn't in JobClient.  

I'm also vaguely uncomfortable with the privileged access of the webapp/ to the JobTracker.

> JobInProgress should be public (or implement a public interface)
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Michael Bieniosek
>
> I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.
> I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  
> The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  
> So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492988 ] 

Doug Cutting commented on HADOOP-1313:
--------------------------------------

The standard public API for such things is JobClient, not JobTracker.  Is that not sufficient?  It may not be, but I think that's the thing to change, rather than exposing stuff through JobTracker directly.  To my thinking, the only reason that JobTracker is a public class is so that one can start and stop the daemon.  Other interaction should be through JobClient.  Could that work for you?

> JobInProgress should be public (or implement a public interface)
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Michael Bieniosek
>
> I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.
> I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  
> The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  
> So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1313) JobInProgress should be public (or implement a public interface)

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493146 ] 

Doug Cutting commented on HADOOP-1313:
--------------------------------------

> So the only thing I care about is the actual number of running/completed tasks, which isn't in JobClient.

Isn't it?  The ClusterStatus includes the number of running map and reduce tasks.  The number of completed tasks can be determined from getMapTaskReports() and getReduceTaskReports(), or incrementally with getJob().getTaskCompletionEvents().  What's missing?

> I'm also vaguely uncomfortable with the privileged access of the webapp/ to the JobTracker.

Me too.  I think the webapp should ideally use only JobClient to access the JobTracker.  Not because I don't trust the webapp, but rather because I think that anything that's visible in the webapp should be accessible to user programs.  Probably we should put the webapp in a separate package.  We should also split JobTracker into a public launcher class and a package-private class that implements various protocols, so that the protocol implementation methods are not publicly visible, tempting folks to call them directly.  But such cleanups are separate issues.  If you feel strongly enough, please file issues and submit patches for this.

> JobInProgress should be public (or implement a public interface)
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1313
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Michael Bieniosek
>
> I'm trying to get programmatic access to hadoop job/task status through the JobTracker api.
> I notice that JobTracker returns a JobInProgress object in several public methods (runningJobs, getJob).  However, JobInProgress is a package-access class.  So, oddly, I can get JobTracker.getJob(), but I can't store the result as a JobInProgress (I suppose I could store it as an Object, but then I couldn't upcast it back).  
> The JobInProgress object gives me useful information about jobs, so I don't think making runningJobs/getJob not public is a good idea.  I get the idea from HADOOP-28 that JobInProgress is not public because nobody wants to maintain compatibility in this class across hadoop versions.  
> So it would probably be best if we created public interfaces that JobInProgress and TaskInProgress implement.  I only care about the accessors, so maybe from JobInProgress we could expose (getProfile, getStatus, get*Time, {finished,desired,running}{Maps,Reduces}, getMapTasks, getCounters) and from TaskInProgress (isRunning, isComplete, isFailed, isMapTask, numTaskFailures, numKilledTasks, getProgress, getCounters).
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.