You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/04/14 20:24:12 UTC

[jira] [Comment Edited] (TEZ-2319) DAG history in HDFS

    [ https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494529#comment-14494529 ] 

Jeff Zhang edited comment on TEZ-2319 at 4/14/15 6:24 PM:
----------------------------------------------------------

[~rohini] 
* Regarding the jobconf.xml, I think it is serialized into payload of Processor, and already be in the dagPlan of DAGSubmittedEvent
* Regarding the job history data, Is the TaskAttemptFinishedEvent & TaskFinishedEvent sufficent for you ?
BTW shouldn't the job history data analysis based on the ATS ? Currently the data written on HDFS is only for recovery. 


was (Author: zjffdu):
[~rohini] Should the job history data analysis based on the ATS ? Currently the data written on HDFS is only for recovery. If all the job history details are written to HDFS, it looks like a little redundancy. 

> DAG history in HDFS
> -------------------
>
>                 Key: TEZ-2319
>                 URL: https://issues.apache.org/jira/browse/TEZ-2319
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>   We have processes, that parse jobconf.xml and job history details (map and reduce task details, etc) in avro files from HDFS and load them into hive tables for analysis for mapreduce jobs. Would like to have Tez also make this information written to a history file in HDFS when AM or each DAG completes so that we can do analytics on Tez jobs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)