You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2014/09/05 23:53:28 UTC

[jira] [Commented] (MAPREDUCE-5933) Enable MR AM to post history events to the timeline server

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123658#comment-14123658 ] 

Zhijie Shen commented on MAPREDUCE-5933:
----------------------------------------

Hi [~rkanter], thanks for taking care of this work. I just scanned through the patch. The approach looks fine to me in general. Here're some comments:

1. Should we make publishing MR job information to the timeline server be configurable. I mean the specific config for MR job.

2. Is it better to do the put entity operation on the a separate thread? JobHistoryHandler seems to run on the main dispatcher thread. We'd better not block other event processing.

3. When error happens to MR job information publishing, IMHO, it shouldn't fail the MR job. The arguable point is that MR job relies on job history to determine the final status, and even for recovery. Once we rebase the job history server based on the data in the timeline store, we should somehow know the job history is (partly) missing or corrupted.

4. Another choice is that you can buffer the events locally and push it once when the job is done/committed. This is a more conservative way as we did for the job history file. OTOH, publishing the event immediately may provide realtime/near-realtime job monitoring. We may want to think more about the choice here. For example, if the job crashes in the middle, the timeline server is going to have a partial history for a MR job.

5. If we want to visualize the counter details in JSON output, it's good build nested JSON data structure. Another efficient way is to use Writable interface to ser/deser the counter into/from bytes.

6. Are all HistoryEvent subclasses properly handled? I randomly searched for TaskFailedEvent, which seemed not to be dealt with. It seems that some of them are only used for rumen, such as JobStatusChangedEvent and TaskUpdatedEvent. These update events may be problematic if they're very frequent.

> Enable MR AM to post history events to the timeline server
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-5933
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5933
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mr-am
>            Reporter: Zhijie Shen
>            Assignee: Robert Kanter
>         Attachments: MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, mr_timelineserver_response.txt
>
>
> Nowadays, MR AM collects the history events and writes it to HDFS for JHS to source. With the timeline server, MR AM can put these events there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)