You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/24 13:00:45 UTC

[jira] [Commented] (APEXCORE-602) Provide a "group-id" in the event object so that events are grouped together by a "root cause".

    [ https://issues.apache.org/jira/browse/APEXCORE-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882623#comment-15882623 ] 

ASF GitHub Bot commented on APEXCORE-602:
-----------------------------------------

GitHub user DT-Priyanka opened a pull request:

    https://github.com/apache/apex-core/pull/479

    [Review only] APEXCORE-602: group events by cause

    This feature is to group events which are raised due to common cause e.g. if a operator fails it causes all downstream operators to redeploy and this action raises bunch of events. The events should have common groupId for reference.
    
    The code changes following following path for code changes:
    **When Operator throws an exception,**
    1. StreamingContainer containing operator generates a event groupId and raises OperatorErrorEvent with generated groupId
    2. Then StreamingContainer sends groupId to StrAM in heartbeat
    3. StaAM saves this groupId for future use.
    4. When StreamingAppMasterService detects that a container is killed with non-zero exit code it schedules redeployment for all downstream operators.
    5. StrAM, when scheduling redeployment for downstream operators maps groupId to all scheduled operators.
    6. StrAM then sends undeploy signals to operators along with groupId in heartbeat response. StrAM also raises OperatorStop event and refers to same groupId.
    7.  StreamingContainer remember groupId and sends it back in heartbeat when it starts operator again.
    8. StrAM then uses this groupId to raise OperatorStart event.
    
    StrAM also tracks container stop and start to raise ContainerStop and ContainerStart events.
    
    **When StrAM kills a container**
    1. StreamingAppMasterService detects that a container is killed and removes container agent. As well as creates RedeploymentInformation with groupId
    2.  follows steps 4-8 from above flow


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DT-Priyanka/incubator-apex-core APEXCORE-602-events-grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-core/pull/479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #479
    
----
commit ae8a6ab3ad3107627eb6bea886b4e167fa669d3b
Author: priya <pr...@apache.org>
Date:   2017-02-23T11:55:56Z

    APEXCORE-602: group events by cause

----


> Provide a "group-id" in the event object so that events are grouped together by a "root cause".
> -----------------------------------------------------------------------------------------------
>
>                 Key: APEXCORE-602
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-602
>             Project: Apache Apex Core
>          Issue Type: Improvement
>            Reporter: Sanjay M Pujare
>            Assignee: Priyanka Gugale
>
> Provide a "group-id" in the event object so that events are grouped together by a "root cause". An example is a bunch of container restarts are related to a single failure in the application but the current sequence of Stram events doesn't make it obvious. The consumer of events is able to better read/analyze the events because of the group-id and focus on the root-cause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)