You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chengxiang Li (JIRA)" <ji...@apache.org> on 2014/09/04 10:02:51 UTC
[jira] [Commented] (SPARK-2321) Design a proper progress reporting & event listener API

    [ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121085#comment-14121085 ] 

Chengxiang Li commented on SPARK-2321:
--------------------------------------

I collect some hive side requirement here, which should be helpful for spark job status and statistic API design.

Hive should be able to get the following job status information through Spark job status API.
1. job identifier
2. current job execution state, should include RUNNING/SUCCEEDED/FAILED/KILLED.
3. running/failed/killed/total task number on job level.
4. stage identifier
5. stage state, should include RUNNING/SUCCEEDED/FAILED/KILLED
6. running/failed/killed/total task number on stage level.

MR/Tez use Counter to collect statistic information, similiar to MR/Tez Counter, it would be better if Spark job statistic API organize statistic information with:
1. group same kind statistic information by groupName.
2. displayName for both group and statistic information which would uniform print string for frontend(Web UI/Hive CLI/...).


> Design a proper progress reporting & event listener API
> -------------------------------------------------------
>
>                 Key: SPARK-2321
>                 URL: https://issues.apache.org/jira/browse/SPARK-2321
>             Project: Spark
>          Issue Type: Improvement
>          Components: Java API, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>
> This is a ticket to track progress on redesigning the SparkListener and JobProgressListener API.
> There are multiple problems with the current design, including:
> 0. I'm not sure if the API is usable in Java (there are at least some enums we used in Scala and a bunch of case classes that might complicate things).
> 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of attention to it yet. Something as important as progress reporting deserves a more stable API.
> 2. There is no easy way to connect jobs with stages. Similarly, there is no easy way to connect job groups with jobs / stages.
> 3. JobProgressListener itself has no encapsulation at all. States can be arbitrarily mutated by external programs. Variable names are sort of randomly decided and inconsistent. 
> We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org