You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ray Chiang (JIRA)" <ji...@apache.org> on 2015/06/09 23:58:00 UTC
[jira] [Commented] (MAPREDUCE-6376) Fix long load times of .jhist
file in JobHistoryServer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579634#comment-14579634 ]
Ray Chiang commented on MAPREDUCE-6376:
---------------------------------------
A few comments:
1) It turns out that Avro parsing is anywhere from 70% to 90% of the .jhist processing time. Some data points for the json .jhist file:
- 50k mappers
-- 20 seconds overall read time
-- 16.6 seconds Avro parsing/reading
- 404k mappers
-- 68 seconds
-- 49 seconds Avro parsing/reading
-- 751k mappers
-- 300 seconds
-- 280 seconds Avro parsing/reading
2) I couldn't get access to a machine to generate more than 50k mapper jobs, but my rough experiments would see about 4x to 5x speedup in Avro parsing/reading. For the worst case improvement on 751k mappers, I would expect the 300 seconds of processing time to get down to about 90 seconds. There is room to shave down the processing time by a few seconds here and there, but that's probably better left to subsequent JIRAs.
3) The .jhist file output format is now a configuration option, with the default set to json.
> Fix long load times of .jhist file in JobHistoryServer
> ------------------------------------------------------
>
> Key: MAPREDUCE-6376
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6376
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver
> Affects Versions: 2.7.0
> Reporter: Ray Chiang
> Assignee: Ray Chiang
> Attachments: MAPREDUCE-6376.001.patch
>
>
> When you click on a Job link in the JHS Web UI, it loads the .jhist file. For jobs which have a large number of tasks, the load time can break UI responsiveness.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)