You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/27 19:49:00 UTC

[jira] [Commented] (NIFI-10070) NiFi fails to delete/update component because it's still running, immediately after confirming that the component is stopped.

    [ https://issues.apache.org/jira/browse/NIFI-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572104#comment-17572104 ] 

ASF subversion and git services commented on NIFI-10070:
--------------------------------------------------------

Commit c77f85aafb7d055caf2215fc6d382cb3faed1544 in nifi's branch refs/heads/main from Nathan Gough
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=c77f85aafb ]

NIFI-10070 Updated status merging for ControllerService and ReportingTask entities

- Corrected node identifier selection in multiple Mergers

This closes #6154

Signed-off-by: David Handermann <ex...@apache.org>


> NiFi fails to delete/update component because it's still running, immediately after confirming that the component is stopped.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-10070
>                 URL: https://issues.apache.org/jira/browse/NIFI-10070
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Nathan Gough
>            Priority: Major
>              Labels: clustering, entity, merging, response
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This issue has been identified by analyzing the logs, code, etc., of the system tests. Many of the system tests indicate that after each test (or after a set of tests), the flow must be torn down. This will stop all processors/reporting tasks and disable all controller services. It will then wait for them to fully stop/disable, according to the REST API. It will then purge any queues and delete all components. Then it deletes all components.
> However, occasionally we see a failure in the step that deletes the components. One node will indicate that the component cannot be deleted because it's still running, so the REST API will send back a 409. However, before making this request, we've already made a request to get all components and checked that their state is STOPPED/DISABLED and no active threads.
> If we look at the code that is used to determine whether or not they are STOPPED/DISABLED, it is using the "status" field in the Entity objects ( {{reportingTaskEntity.getStatus().getRunStatus()}} for example).
> However, the DTO also has a state field: {{ReportingTaskDTO.getState()}}
> We have a similar situation with Processors, Reporting Tasks, and Controller Services.
> In order to maintain backward compatibility, we need to leave both of these fields. However, the issue we have appears to be in the ReportingTaskEntityMerger, ProcessorEntityMerger, and ControllerServiceEntityMerger.
> These mergers do not take into account / merge this status field in the Entity. They take into account only the fields in the DTO. As a result, we can have one node indicating that the status is STOPPED with 0 threads while another node indicates STOPPED with 1 thread. The merging logic may choose the STOPPED with 0 threads, confirming that the component is fully stopped. At this point, a delete or update will fail because the component is not in the desired state on all nodes.
> We need to update the 3 Entity Mergers to ensure that they properly merge the state in the Entity objects as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)