You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Peter Cseh (JIRA)" <ji...@apache.org> on 2018/05/23 12:07:00 UTC

[jira] [Commented] (OOZIE-3254) [coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError when there are too many coordinator actions to materialize

    [ https://issues.apache.org/jira/browse/OOZIE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487137#comment-16487137 ] 

Peter Cseh commented on OOZIE-3254:
-----------------------------------

This can occur with more realistic frequencies when a coordinator is resubmitted - e.g. after disaster recovery - without changing the start time - as the execution is LAST_ONLY, why should we care about the start time?

> [coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError when there are too many coordinator actions to materialize
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-3254
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3254
>             Project: Oozie
>          Issue Type: Bug
>          Components: coordinator
>    Affects Versions: 5.0.0
>            Reporter: Andras Piros
>            Assignee: Andras Piros
>            Priority: Major
>
> If there is a coordinator job defined with a {{frequency}} by the minute (e.g. {{frequency="* * * * *"}}), and {{start-time}} lies well in the past, and the coordinator job's {{execution-mode}} is {{LAST_ONLY}} or {{NONE}}, it can happen that too many {{CoordinatorActionBean}} instances are kept on JVM heap within {{CoordMaterializeTransitionXCommand#insertList}} as those execution modes [*omit the check for the {{throttle}} value*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L439-L443].
> As a consequence, we can see as many as multiple hundred thousands of log entries [*trying to increase {{CoordMaterializeTransitionXCommand#insertList}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L560-L566]:
> {noformat}
> [user@host ~]$ grep 'In storeToDB() coord action id' /var/log/oozie/oozie-HOSTNAME.log.out | wc -l
> 478408
> {noformat}
> A much worse consequence is that those {{CoordinatorActionBean}} instances are attached to GC root (the {{insertList}} itself), and thus, JVM is unable to free them until a consequent call to {{insertList.clear()}}. This will result in {{OutOfMemoryError}} occurrence in worst case.
> {{CoordMaterializeTransitionXCommand#insertList}} should be watched for a configurable limit parameter (default value something like 1000), and persisted / cleared when that limit is reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)