You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Yoram Arnon (JIRA)" <ji...@apache.org> on 2006/05/22 19:29:29 UTC

[jira] Created: (HADOOP-239) job tracker WI drops jobs after 24 hours

job tracker WI drops jobs after 24 hours
----------------------------------------

         Key: HADOOP-239
         URL: http://issues.apache.org/jira/browse/HADOOP-239
     Project: Hadoop
        Type: Bug

  Components: mapred  
    Reporter: Yoram Arnon
    Priority: Minor


The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
if the cluster was idle for a day (say Sunday) it drops all its history.
Monday morning, the page is empty.
Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426073 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

How about providing an additional JSP for job history. Current JSP's are using live data (not persistant), they will be significantly changed if we read from log file as well as live jobtracker in them. It will be cleaner if we dump the job log in a file and read the file in a separate JSP, linked from current page. Comments ? 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426302 ] 
            
Yoram Arnon commented on HADOOP-239:
------------------------------------

if a job tracker fails while running a job, you'd want the failed job's information shown in the history, so you probably want to log a job in the file as soon as it's launched, rather than when it's completed.

There are several links that can be drilled down into from the job information - tasks, failed tasks, task trackers etc. Take care to store all that information in the log file so we do not lose any current functionality.

creating a file per day may be good, as it will enable a simple way of keeping a configurable length of history.

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Attachment: Hadoop-239_1.patch

Updating patch with latest trunk. Up for review. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Work started: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Work on HADOOP-239 started by Sanjay Dahiya.

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426325 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

Here is a first cut at the information in job history. While displaying in JSP we can show it in different views like by hosts or by jobs  -  

 - jobid 
 - jobName
 - User
 - jobconf ( job.xml ) 
 - start time 
 - finish time
 - Status 
 - total maps 
 - total reduces
 - finished maps ( if make -k )
 - finished reduces ( if make -k )
 - Available task trackers list at job start (to find which hosts never ran any tasks, not sure if this is available from else where ?)
 
 - maps 
 	- taskid
 		- task attempt
 			- hostname
 			- start time
 			- finish time 
 			- error

 - reduces 
 	- taskid 
 		- task attempt
 			- host name 
 			- start time
 			- finish time
 			- phases 
 				- copy (?)
 					- start time
 					- finish time
 				- sort 
 					- start time
 					- finish time
 				- reduce 
 					- start time
 					- finish time
 			- status 
 			- error 
 	

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya reassigned HADOOP-239:
------------------------------------

    Assignee: Sanjay Dahiya

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Attachment: Hadoop-239_3.patch

thanks Owen
I changed the casing for enums, here is the updated patch, 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch, Hadoop-239_2.patch, Hadoop-239_3.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12437898 ] 
            
Doug Cutting commented on HADOOP-239:
-------------------------------------

Perhaps we should change the string constants in JobInfo.java to be an enum (or two), to get the compiler to do a bit more work for us?  Or, we could use a record i/o record and XML output.  This way serializers and deserializers would be written for us.  That's perhaps overkill, but it's worth thinking about.

In any case, we need to make sure it's easy for these files to evolve.  Adding and removing fields should be trivial, and processing old files that have obsolete or missing fields should also be natural.  Right now, each new field requires a number of coordinated changes.  Is there any way we can reduce these?

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12437921 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

yes, we can use enums in JobInfo and other classes and use String values to store in files so its still human readable. I will make this change. 
XML may not be a good option as history is written in append mode, at any point of time the file is consistent if JT crashes. In case of XML wither we write whole file again with every change or a JT crash will leave it malformed, thats the only reason I didnt use XML. 
I will think over simplifying evolving this file and get back again. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Doug Cutting updated HADOOP-239:
--------------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 0.7.0
       Resolution: Fixed

I just committed this.  Thanks, Sanjay!

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch, Hadoop-239_2.patch, Hadoop-239_3.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12431511 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

If we keep a single history file for jobtracker we will run into a very large history files very soon, specially when there are large number of small tasks. On the other hand if we rollover the file every day then job start and end events for longer jobs or the jobs that start on the day end will be in different log files. We will still be able to see daily activity but drilling into jobs will be a problem as we will have to look up in multiple huge file for job specifc events. 
Yoram and I discussed over IM and here is current approach. 

We maintain a master file for all jobs - this file contains only job start/finish events along with no of tasks failed at finish. If the JobTracker dies before finishing a job then we dont log number of failed taks in this file. 

For each job we create a separate history log file and this file contains task/taskattempt start and finish times along with failures if any. 

The master index is rolledover every month, and during rollover we look for all jobs that have not finished and move them to the new file and discard old jobs. The detailed history log for jobs older than a month will get deleted. 

The master index will be used to render the main JSP for job history, clicking on the job will cause corresponding job file to be loaded / parsed and displayed on respective JSPs. 

Start time of the jobtracker is used as an extra key to uniquely identify jobs since same jobids are used when jobtracker restarts. 

We will not have any host specific view of tasks in this case. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12438487 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

Here are some new changes ( will submit patch in a while ) 

Moved code to Java 5

Replaced all data attributes in JobInfo and other objects with HashMap. All key-value pairs are accessed through get(KEY), set(Key, Value) methods. All keys are defined in a single Enum, which acts like a global namespace for all keys used in job history. 

Adding new keys/values is now fairly simple, they need to be added to the Enum. after that these can be read/written without making any change in JobHistory implementation. 

does this sound reasonable?

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Yoram Arnon updated HADOOP-239:
-------------------------------

    Description: 
The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
if the cluster was idle for a day (say Sunday) it drops all its history.
Monday morning, the page is empty.
Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

Also, if the job tracker is restarted, it loses all its history.
The history should be persistent, withstanding restarts and upgrades.

  was:
The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
if the cluster was idle for a day (say Sunday) it drops all its history.
Monday morning, the page is empty.
Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).


> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426306 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

I agree, but to display the status of running job we can use the existing infrastructure rather than the log file as it already provides the most current status. We will log as things happen but use it for display when job fails/completes or job tracker restarts. 
Yes, the current functionality should be the minimum thats available in the log.

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Attachment: Hadoop-239.patch

A patch for maintaining job tracker history. This patch - 
1. instruments JobInProgress to add log events to history file. 
2. Jobhistory management with a job master index file and one history file per job,  parser for job history files. 
3. Contains WI for viewing Jobtracker history. 

This patch depends on Hadoop-263, it wont work otherwise. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12428201 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

For format of history file on disk - here is a proposal. 

Since we want the file to be flushed on every write, to survive jobtracker restarts. it makes sense to use a simple record oriented structure in file. Each log statement appends a record in the file. Since there can be multiple jobs running at any time the records can be intermixed in the log file ( unless we use one history file per job ). 
Using one history file per job is also a viable option in which case we can separate log files in different directories for different days and delete old files. 

In both cases following simple file format can be used to log history and parse/display in JSPs. 
<recordType> <key=value> <key=value> .... 

where recordType = {JobInfo, Task, MapAttempt, ReduceTask, ReduceAttempt .... }
and keys will depend on recordType e.g. for JobInfo keys = {jobId, jobName, submitTime, launchTime ... }

e.g. log while job start up may look like 

JobInfo jobId=job_001 jobName=wordCount submitTime=0001
JobInfo jobId=job_001 launchTime=0002
...
JobInfo jobId=job_001 finishTime=0002

We can provide a proxy class JobHistory, which exposes specific methods for logging different log events and takes care for formatting issues at a central place. 

comments ? 


> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426369 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

thats for looking at job parameters - input/output etc on the web page itself if we keeps lots of jobs in history. currently we have it as a path in a temp directory displayed on the JSP. We can have the same path there or dump the jobConf in the log file. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426154 ] 
            
eric baldeschwieler commented on HADOOP-239:
--------------------------------------------

+1 to the general idea of logging the data and reporting from the log.

It would be good to expand a bit on your idea, so it is clear what you are suggesting.

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426362 ] 
            
Yoram Arnon commented on HADOOP-239:
------------------------------------

looks good.
what would the jobconf be used for?

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Comment: was deleted

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426324 ] 
            
eric baldeschwieler commented on HADOOP-239:
--------------------------------------------

+1 to the gist of this (Sanjay's latest suggestions and yoram's point about startup).

Putting the log in HDFS is interesting, but perhaps a distraction short term.

I think it would be worth trying to use the actual log infrastructure to store this information.  Rolling, compression, removal after a fixed time, no lost state when the sever fails...  all of this sounds like logging.


> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Status: Patch Available  (was: In Progress)

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12439893 ] 
            
Owen O'Malley commented on HADOOP-239:
--------------------------------------

This is looking good, but the enum class names are all caps when they should be CamelCase. (So Keys instead of KEYS and Values instead of VALUES.)

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch, Hadoop-239_2.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-239?page=all ]

Sanjay Dahiya updated HADOOP-239:
---------------------------------

    Attachment: Hadoop-239_2.patch

Here is an updated patch. 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch, Hadoop-239_2.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12418381 ] 

eric baldeschwieler commented on HADOOP-239:
--------------------------------------------

This info should be logged, so that it is durable in the face of jobtracker restarts too

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>          Key: HADOOP-239
>          URL: http://issues.apache.org/jira/browse/HADOOP-239
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Yoram Arnon
>     Priority: Minor

>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12412854 ] 

eric baldeschwieler commented on HADOOP-239:
--------------------------------------------

More please.

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>          Key: HADOOP-239
>          URL: http://issues.apache.org/jira/browse/HADOOP-239
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Yoram Arnon
>     Priority: Minor

>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426326 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

repost, lost tabs in last post - 

Here is a first cut at the information in job history. While displaying in JSP we can show it in different views like by hosts or by jobs  -  


- jobid 
 - jobName
 - User
 - jobconf ( job.xml ) 
 - start time 
 - finish time
 - Status 
 - total maps 
 - total reduces
 - finished maps ( if make -k )
 - finished reduces ( if make -k )
 - Available task trackers list at job start (to find which hosts never ran any tasks or if hosts added in between. not sure if this is available from else where ?)
 
 - maps 
    - taskid
        - task attempt
            - hostname
            - start time
            - finish time 
            - error

 - reduces 
    - taskid 
        - task attempt
            - host name 
            - start time
            - finish time
            - phases 
                - copy (?)
                    - start time
                    - finish time
                - sort 
                    - start time
                    - finish time
                - reduce 
                    - start time
                    - finish time
            - status 
            - error 

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12438516 ] 
            
Doug Cutting commented on HADOOP-239:
-------------------------------------

> does this sound reasonable?

Yes, it sounds great to me!  Thanks!

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-239.patch, Hadoop-239_1.patch
>
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Posted by "Sanjay Dahiya (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12426298 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------

I meant on JobTracker we log the job transition states in a structured ( XML?) persistent file. This is different from the standard log file and can be parsed and displayed by a JSP page (something like JobHistory.jsp) . It includes only completed and failed jobs. So from the user interface perspective running jobs are displayed as they are currently but failed and completed jobs are displayed from this log.

We can add extra information in this log than what we have currently as a postmortem analysis like time spent by different phases or by different hosts (?). 

This will require unique job ids across jobtracker restarts otherwise it will be difficult to track jobs in history with same id. 
This log file can contain a configurable numbers of days of history, which can be browsed by time. Optionally this history can itself reside in HDFS. Does this make sense ?

> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira