You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (Jira)" <ji...@apache.org> on 2022/07/25 19:59:00 UTC

[jira] [Created] (NIFI-10279) Increase timeouts in automated tests around NiFi Stateless

Mark Payne created NIFI-10279:
---------------------------------

             Summary: Increase timeouts in automated tests around NiFi Stateless
                 Key: NIFI-10279
                 URL: https://issues.apache.org/jira/browse/NIFI-10279
             Project: Apache NiFi
          Issue Type: Task
          Components: Core Framework, Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne


NiFi Stateless tends to be very fast. However, on constrained resources like Github Actions, the setup (such as unpacking nars) can take a while. This causes intermittent timeouts.

For example, in the Stateless System Tests, several tests have a timeout of 10 seconds. This is generally plenty of time to run on a laptop that's not overly taxed. For example, on a 2019 Macbook Pro, the tests run in about 2 seconds for me. However, we had a period of about 1 week in which Github Actions was frequently failing with an error such as:
{code:java}
Error:  org.apache.nifi.stateless.basics.AsyncCommitCallbackIT.testSynchronousCommitCleanupAfterFlowFilesTerminated  Time elapsed: 10.067 s  <<< ERROR!
300 {code}
A timeout of 10 seconds is too restrictive for this environment.

Similarly, the ExecuteStateless unit test can take a while when it must unpack NARs. We occasionally see the following error in the ci-workflow build due to a unit test failure:
{code:java}
Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 83.578 s <<< FAILURE! - in org.apache.nifi.processors.stateless.TestExecuteStateless
org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput  Time elapsed: 72.185 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <3> but was: <1>
    at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
    at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
    at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
    at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
    at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527)
    at org.apache.nifi.util.StandardProcessorTestRunner.assertTransferCount(StandardProcessorTestRunner.java:360)
    at org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput(TestExecuteStateless.java:104)
... {code}
At first glance, it appears to indicate that data was improperly processed. But this occurs only occasionally. This appears to be due the fact that the ordering of tests is not guaranteed and can change from invocation to invocation. If this test (testRouteToFailureWithInput) is the first test to be run, this test can fail. This is because of this line of code in the test:
{code:java}
runner.run(3, true, true, 60000L); {code}
It specifies that the processor should run 3 iterations and times out after 60,000 milliseconds. If we analyze the log output, we see that indeed this was the first test run and this log line attracts a lot of attention:
{code:java}
2022-07-26 03:24:38:527 +0900 [pool-2-thread-1] INFO org.apache.nifi.nar.NarUnpacker - NAR loading process took 51455278686 nanoseconds (51 seconds).{code}
So here we can see that unpacking NARs took 51 seconds. Indeed, updating the test to have a timeout of 100 milliseconds (a very short timeout that we expect to trigger) produces the exact same output for me locally.

We need to use a longer timeout here, as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)