You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Ethan Li (Jira)" <ji...@apache.org> on 2021/09/23 17:57:00 UTC

[jira] [Resolved] (STORM-3767) NPE on getComponentPendingProfileActions

     [ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Li resolved STORM-3767.
-----------------------------
    Resolution: Fixed

> NPE on getComponentPendingProfileActions 
> -----------------------------------------
>
>                 Key: STORM-3767
>                 URL: https://issues.apache.org/jira/browse/STORM-3767
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>             Fix For: 2.3.0
>
>         Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a topology is newly submitted, if the scheduling loop takes too long, the component UI might have error 500.
> This is due to the NPE in nimbus code. An example:
> 1. When a scheduling loop finishes, nimbus will eventually update the assignmentsBackend. if a topology is newly submitted, its entry will be added to the idToAssignment map, otherwise, the entry will be updated with new assignments. The key point is the new topology Id doesn't exist in idToAssignment before it reaching here.
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64
> 2. However, this assignmentsBackend update only started to happen at 2021-04-23 15:30:14.299
> {code:java}
> 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment for topology
> {code}
> while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 15:25:13.887. The scheduling loop took longer than 5mins.
> {code:java}
> 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy (1297 states traversed in 1275 ms, backtracked 0 times)
> other topologies were taking long time
> 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy (111 states traversed in 34 ms, backtracked 0 times)
> ...
> 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting lower priority topologies. Additional Memory Required: 20128.0 MB (Available: 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed in 299804 ms, backtracked 65555 times, 89 of 150 executors scheduled)
> ...
> 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - evaluateplus-dev-47-1605825401 Running - Fully Scheduled by GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 times)
> {code}
> 3. During this period, the idToAssignment map in assignmentsBackend wouldn't have the entry for topo1-52-1619191499, so when a component UI was visited,
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69
> it got a null value as the assignment, and hence NPE.
> This can be produced easily by adding some sleep anywhere between 
> {code:title=Nimbus.java}
>             Map<String, SchedulerAssignment> newSchedulerAssignments =
>                     computeNewSchedulerAssignments(existingAssignments, topologies, bases, scratchTopoId);
> {code}
> and
> {code:title=Nimbus.java}
>  state.setAssignment(topoId, assignment, td.getConf());
> {code}
> and submit a new topology and visit its component UI 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)