You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2013/03/01 21:31:13 UTC

[jira] [Commented] (OOZIE-1244) SLA Support in Oozie

    [ https://issues.apache.org/jira/browse/OOZIE-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590907#comment-13590907 ] 

Rohini Palaniswamy commented on OOZIE-1244:
-------------------------------------------

Would like to put forth two possible approaches for discussion. The first approach invovles making SLA support as part of core oozie. The second one involves having it as a separate war file for isolation but can also be hosted in the same tomcat as oozie. Both approaches proposed involve
   * building on top of the event system work being done in OOZIE-1209.
   * Reuse the current sla_events table which has sla registration events and job status events. 
   * Have a SLACalculationService which will get sla registration events for submitted workflows or materialized co-ordinator actions. The registered information will be kept in memory (with intelligent overflow to disk based on expected start and end times), and periodically (lets say every 1 min or 2 min) checked for SLA misses. The elements will be removed from memory once all the possible SLA events are generated. Apart from periodic check, the job status events will also be used to do the calculation as they are received. The generated SLA events will be sent to the EventHandlerService, which will forward to the registered SLAEventListeners.
  * Three implementations of SLAEventListeners - one to send JMS notifications, one to do email alerting and one to write the SLA events to a database. Lets call this sla_info table to avoid confusion with the existing sla_events table. The SLA information from database can be used for querying and for building dashboard.
  * Rest APIs to query the SLA information table. 
  * A simple dashboard to view and filter historical SLA information.

Design 1 (Core Oozie): 
   * SLACalculationService if in core oozie will implement WorkflowEventListener and CoordinatorEventListener and  get materialization events and job status events through that. For recovery during restart, it will query the SLA_EVENTS table to get the registered events.
    
Pros:
   * Part of core oozie. Easy to configure and setup. 
   * Acting on job events will make processing faster.  
   * Framework for JMS, db access can be reused.
  
Cons:
   * SLA processing will consume CPU cycles affecting core functionality.

Design 2 (Separate Service): 
   * The service will reuse the oozie-core framework - EventHandlerService, JMSAccessorService (jms notifications), JPAService(sla_info to database), etc. 
   * SLACalculationService here will rely mainly on fetching registration and status events in a regular interval from SLA_EVENTS table (bulk fetch on sequenceid) of Oozie through REST APIs. 
   
Pros:
   * Isolation from core. Ability to host as a separate service if needed for performance. 
   * Ability to consolidate SLA information of multiple clusters in future. Just a thought. Might not be doing it.

Cons:
   * More load on Oozie core DB due the frequent DB calls even if we are fetching based on sequence id. 
   * Possible slight delay in SLA calculations as we do bulk fetch in periodic interval instead of acting immediately on events. 
   * Dev work to create separate service. 
   * Adds complexity to setup and deployment
   
   One option would be do have the code in a separate module, but have profiles to either build it as part of core oozie war or as a separate war making it a choice of deployment.
                
> SLA Support in Oozie
> --------------------
>
>                 Key: OOZIE-1244
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1244
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>   Would like to have the following features in Oozie
>  - JMS notifications on SLA met, SLA start miss, SLA end miss and SLA duration miss
>  - Email alerting for SLA start miss, SLA end miss and SLA duration miss
>  - API to query SLA met/miss information. Currently the SLA information that can be queried is only SLA registration event and job status events. One has to calculate the actual misses from those. 
>  - A simple dashboard to view and query the SLA met/miss information built on the API mentioned above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira