You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2007/09/11 08:46:32 UTC

[jira] Created: (HADOOP-1876) Persisting completed jobs status

Persisting completed jobs status
--------------------------------

Key: HADOOP-1876
URL: https://issues.apache.org/jira/browse/HADOOP-1876
Project: Hadoop
Issue Type: Improvement
Components: mapred
Environment: all
Reporter: Alejandro Abdelnur
Priority: Minor

Currently the JobTracker keeps information about completed jobs in memory.

This information is flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY).

Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.

If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.

A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup in DFS would be done to retrieve the completed job information.

A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.

A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.

This improvement would not introduce API changes.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556472#action_12556472 ] 

Alejandro Abdelnur commented on HADOOP-1876:
--------------------------------------------

Yes, JobHistory could be tweaked to save data in DFS instead of the local filesystem, however:

  * JobHistory data is not accessible via the RunningJob API
  * JobHistory API is not accessible from a client
  * JobHistory information does not contain counter information

The nice thing about the proposed patch is that for the client it is completely transparent if the job info is in memory or already persisted, it is always accessible via the RuningJob API. And this makes a monitoring client much simpler.

Also, the code changes in the JobTracker are minimal, just hooks to persist the job info upon job completion and to read job info from DFS if the job info is not in memory anymore.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Status: Patch Available  (was: Open)

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Status: Open  (was: Patch Available)

i'll change code so findbugs does not complain, plus found a bug in the run() method of the new thread

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556584#action_12556584 ] 

Runping Qi commented on HADOOP-1876:
------------------------------------


The point is that you don't need to write RunningJobs out at all. 
You can re-create them from the JobHistory log. The job tracker 
can log the counter info to the job history files at the completion of a job
in the same way as other data is logged.

And Hadoop already has a JobHistory log parser, thus you don't need 
to write much new code for parsing the log file.

The JobHistory log file is one file per job. Thus the performance for extracting the data for 
a job is independent of how many jobs are there. I don't think performance is a concern here.
Actually, I believe it will be much faster than to extract from a DFS based persistent store.
 
It will be fine if we want to archive the job history log files when they become too old.
That should be optional.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557618#action_12557618 ] 

Devaraj Das commented on HADOOP-1876:
-------------------------------------

bq. Should we broaden the scope HADOOP-2178 to re-work JobHistory to use Writables rather than the custom format? Or is that a new jira?

This should be a new jira IMO.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557292#action_12557292 ] 

Runping Qi commented on HADOOP-1876:
------------------------------------


I am fine with the approach of this patch if it turns out to be simpler than using JobHistory.

Can this patch make JobHistory log obsolete? Or at least is that intended?
I hate to see same information logged at different places 
in different forms using different code paths.

Other than being in text format (which has its pros and cons), job history log is event based, 
from which you can construct the whole execution history of a job and derive various time-series data
such as number of mappers/reducers of different states (waiting, running, sorting, shuffling, completed, etc) 
This kind of information if important for understanding the runtime behavior of the job.

If RunningJob can easily accomodate those kinds of time series data, then I am OK to obsolete job history log.

Also, have folks considered the relationship between this and HADOOP-2178?



> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557173#action_12557173 ] 

Alejandro Abdelnur commented on HADOOP-1876:
--------------------------------------------

The JobHistory has a log file per job for the details of the file, but it has a single master file with some basic info. This makes difficult to use DFS as append is not supported at the moment.

The reason for using DFS for storing the job info is that if the JobTracker box dies I could bring it up on another box and still have the job info of completed jobs from previous runs. The configuration of the directory could point to a local FS directory if this is not a concern.

The way that JobHistory splits and process the log information it will make difficult and require much more code (than with the proposed patch) to recreate the RunningJob object out of the JobHistory LOG files. RunningJob/Counters/JobStatus/CompletedTasks are all Writable implementations, so writing/reading is already taken care of by them automatically.

It seems to me that it would be much easier to retrofit the JobHistory to use info out the files the patch is writing that the other way around.

In my opinion the use of the of the JObHistory log files is very different from the use of the proposed patch.

Also note that by default the proposed patch does not persist any job info, only if explicitly configured. So there is no penalty on performance/storage if it is not activated.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1876:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Alejandro!

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Attachment: patch1876.txt

added a config property to activate job status persistency. Default is FALSE.

If false the clean up thread is not started.



> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555927#action_12555927 ] 

Devaraj Das commented on HADOOP-1876:
-------------------------------------

Alejandro, did you evaluate the approach where you could tweak the jobhistory component in hadoop. The jobhistory component already saves the status of jobs/tasks on the localfs. It could save the history on the dfs and on top of that you would need a wrapper around it to return you JobStatus objects?

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556498#action_12556498 ] 

Runping Qi commented on HADOOP-1876:
------------------------------------

You can add counter information to JobHistory data easily. That should not be a big deal.

The jobtracker can also extract the status data from JobHistory data by a job id, and make the status data 
available through JobTracker API in the same way as for other running and/or newly completed jobs.
Whether the status data is in memory or not  is transparent to the client.




> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557397#action_12557397 ] 

Arun C Murthy commented on HADOOP-1876:
---------------------------------------

bq. Can this patch make JobHistory log obsolete? Or at least is that intended? I hate to see same information logged at different places in different forms using different code paths.

This patch doesn't do that, but definitely that is the direction I'd go too... +1.

Should we broaden the scope HADOOP-2178 to re-work JobHistory to use Writables rather than the custom format? Or is that a new jira?

bq. Other than being in text format (which has its pros and cons), job history log is event based [...]

Yes, moving to Writable wouldn't hurt the _job analysis_ part since, as you point out, it's event-based - we just need to use Writable.readFields rather than the custom text-parsing... anyone sees other issues?



> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558168#action_12558168 ] 

Hadoop QA commented on HADOOP-1876:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372904/patch1876.txt
against trunk revision r611315.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/console

This message is automatically generated.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Priority: Critical  (was: Minor)

This enhancement would allow to handling outside of hadoop what issue 1121 tries to address within hadoop.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559056#action_12559056 ] 

Hudson commented on HADOOP-1876:
--------------------------------

Integrated in Hadoop-Nightly #366 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/366/])

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556500#action_12556500 ] 

Alejandro Abdelnur commented on HADOOP-1876:
--------------------------------------------

Yes, you can do all that, but that involves many more changes. 

The JobHistory writes all job info to a LOG file. This has the following issues:

  * the write/read methods of the RunningJob elements cannot be leveraged, a special writing/parsing has to be done for each (it would have to be done for counters).
  * the log file would have to be traversed for every job info not found in memory, with thousands of jobs in there this would certainly slow down to a crawl.
  * by moving the JobHistory LOG to DFS, then appending becomes an issue.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1876:
----------------------------------

    Assignee: Alejandro Abdelnur
      Status: Open  (was: Patch Available)

bq. It seems to me that it would be much easier to retrofit the JobHistory to use info out the files the patch is writing that the other way around.

I guess we should consider the fact that we might be better off, in the long run, moving away from the custom, textual format used today by the {{JobHistory}} and go the {{Writable}} way - much lesser and more standard code. I don't believe the textual format buys us much, and is a pain to maintain.

If folks agree, I'm okay with this patch going in as-is (oh, and yes, this is a very different use-case) and then fixing {{JobHistory}} to use {{Writable}} to serialize the necessary data-structures. Thoughts?

----

That said, some comments about the patch:

Alejandro, could you please ensure that the {{completedJobsStoreThread}} isn't _started at all_ if the feature is switched off?

Maybe we could add a boolean {{mapred.job.tracker.persist.jobstatus}} flag to turn the feature on/off.



> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Fix Version/s: 0.16.0
           Status: Patch Available  (was: Open)

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Attachment: patch1876.txt

Changed thread handling to follow identical pattern as other threads managed by the JT.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Attachment: patch1876.txt

The CompletedJobStatusStore class performs 3 tasks:

  * persists in DFS the status/profile/counters/completion-evens of a JobInProgress instance
  * reads from DFS, giving the jobID the status/profile/counters/completion-evens of a job
  * runs a daemon thread that once an hour cleans up persisted jobs that exceeded their storage time

It is configured with 2 properties:

  * the DFS directory where to persist the jobs (default: '/jobtracker/jobsInfo')
  * the storage time job info must be kept in DFS before is cleaned up (default: 0 - which means no storage at all)

---
Changes to the JobTracker:

The JobTracker creates a CompletedJobStatusStore at initialialization time. 

When the 'finalizeJob()' method is called the CompletedJobStatusStore 'store()' method is called to persist the job information.

The getJobProfile()/getJobStatus()/getCounters()/getTaskCompletionEvents() call the corresponding get*() method from the CompletedJobStatusStore if the in memory queues don't have information about the request job ID. 
---
It includes a testcase to test the persistency of job info across JobTracker restarts
---
As job ID include the JobTracker startup timestamp there is no risk of collision on jobIDs. As the directory for persisting the job info is configurable, the same DFS can be used by multiple JobTrackers without any risk of collision of job IDs from different JobTrackers in the unlikely case their startup timestamp is identical.
---
The default behavior is 0 persistency time for job info, thus nothing is written to DFS on the normal configuration and the store()/read() methods are NOP and the daemon thread is not started.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1876) Persisting completed jobs status

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1876:
---------------------------------------

    Status: Patch Available  (was: Open)

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558392#action_12558392 ] 

Hudson commented on HADOOP-1876:
--------------------------------

Integrated in Hadoop-Nightly #364 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/364/])

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552421 ] 

Hadoop QA commented on HADOOP-1876:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371789/patch1876.txt
against trunk revision r604451.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1365/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1365/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1365/console

This message is automatically generated.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552349 ] 

Hadoop QA commented on HADOOP-1876:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371779/patch1876.txt
against trunk revision r604451.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1363/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1363/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1363/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1363/console

This message is automatically generated.

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.