You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2009/03/03 10:44:57 UTC

[jira] Commented: (HADOOP-4670) Improve the way job history files are managed

    [ https://issues.apache.org/jira/browse/HADOOP-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678270#action_12678270 ] 

Amar Kamat commented on HADOOP-4670:
------------------------------------

I had an offline discussion with Devaraj, Hemanth and Sharad. Seems like the following structure should solve this issue :
# old history files : path-to-job-history/
# history files for jobtracker on host hostname: path-to-job-history/hostname
# history files for user username using jobtracker running on hostname: path-to-job-history/hostname/username
# job history file format : <start-time>_<jobid>_<jobname>

Structuring it further on year, month and day might prove useful but for now it looks like a premature step. If needed we can add it later. So users who submit job at very high rate will be affected as compared to users that submit jobs less frequently. Searching will be easier per-user.

Future steps :
1) Add date level info in structuring or atleast display
2) Add indexing info for faster access/display
3) Provide various view like recent ones, sort by day/week/month/year, jobname (sorting and structuring) etc.
4) Secure access
5) Faster access and analysis (involves changes/tweaks to JobHistory and parsing).

Thoughts?

> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: HADOOP-4670
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4670
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause problems when there is a need to search the history folder (job-recovery etc). It would be nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid, date, jobname_ etc but using _username_ will make the search much more efficient and also will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.