You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ujjawal Kumar (Jira)" <ji...@apache.org> on 2022/10/06 11:41:00 UTC

[jira] [Updated] (MAPREDUCE-7410) Expose API to get task ids and individual task report given task Id from org.apache.hadoop.mapreduce.Job

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ujjawal Kumar updated MAPREDUCE-7410:
-------------------------------------
    Description: 
Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API to fetch task reports of either mapper or reducer. However for MR jobs with large number of tasks this causes OOM issues while fetching all task reports as seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also exposes an API getTaskReport() where a TaskId can be provided within the GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that individual task report can be fetched after listing them from client side
 # Job.getTasks(TaskType) -> List<TaskId> - This would return TaskId of all tasks with given Type to the client
 # Job.getTaskReport(TaskId) -> TaskReport - This would return task report for single task to the client

For JHS since JobHistoryParser.parse already parses full history file by default and maintains the list of tasks within JobHistoryParser.JobInfo's tasksMap, this info should be easy to get

One additional thing that needs to be seen is if this can be supported for requests which are redirected to MRClientService (within MRAppMaster) for running jobs

!Screenshot 2022-10-06 at 4.46.48 PM.png!

  was:
Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API to fetch task reports of either mapper or reducer. However for MR jobs with large number of tasks this causes OOM issues while fetching all task reports as seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also exposes an API getTaskReport() where a TaskId can be provided within the GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that individual task report can be fetched after listing them from client side
 # Job.getTasks(TaskType) -> List<TaskId> - This would return TaskId of all tasks with given Type to the client
 # Job.getTaskReport(TaskId) -> TaskReport - This would return task report for single task to the client

For JHS since JobHistoryParser.parse already parses full history file by default and maintains the list of tasks within JobHistoryParser.JobInfo's tasksMap, this info should be easy to get

One additional thing that needs to be seen is if this can be supported for requests which are redirected to MRClientService (within MRAppMaster) for running jobs


> Expose API to get task ids and individual task report given task Id from org.apache.hadoop.mapreduce.Job
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7410
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7410
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver, yarn
>            Reporter: Ujjawal Kumar
>            Priority: Minor
>         Attachments: Screenshot 2022-10-06 at 4.46.48 PM.png
>
>
> Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API to fetch task reports of either mapper or reducer. However for MR jobs with large number of tasks this causes OOM issues while fetching all task reports as seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also exposes an API getTaskReport() where a TaskId can be provided within the GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that individual task report can be fetched after listing them from client side
>  # Job.getTasks(TaskType) -> List<TaskId> - This would return TaskId of all tasks with given Type to the client
>  # Job.getTaskReport(TaskId) -> TaskReport - This would return task report for single task to the client
> For JHS since JobHistoryParser.parse already parses full history file by default and maintains the list of tasks within JobHistoryParser.JobInfo's tasksMap, this info should be easy to get
> One additional thing that needs to be seen is if this can be supported for requests which are redirected to MRClientService (within MRAppMaster) for running jobs
> !Screenshot 2022-10-06 at 4.46.48 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org