You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/04/27 20:00:00 UTC

[jira] [Updated] (HDDS-6661) TestBackgroundPipelineScrubber.testRun() fails intermittently

     [ https://issues.apache.org/jira/browse/HDDS-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell updated HDDS-6661:
------------------------------------
    Description: 
TestBackgroundPipelineScrubber.testRun() fails intermittently for me locally, with this trace:

{code}
2022-04-27 17:20:44,888 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:start(123)) - Starting Pipeline Scrubber Service.
2022-04-27 17:20:44,890 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:notifyStatusChanged(85)) - Service BackgroundPipelineScrubber transitions to RUNNING.
2022-04-27 17:20:47,905 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:stop(140)) - Stopping Pipeline Scrubber Service.
2022-04-27 17:20:47,905 [PipelineScrubberThread] WARN  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:run(158)) - PipelineScrubberThread is interrupted, exit


Wanted but not invoked:
pipelineManager.scrubPipelines();
-> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
Actually, there were zero interactions with this mock.

Wanted but not invoked:
pipelineManager.scrubPipelines();
-> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
Actually, there were zero interactions with this mock.
{code}

I believe the reason is that when the `notifyStatusChanged()` method is called, the thread may be running and not waiting.

Then calling notifyAll() does not do anything. The thread will fall into the wait and stay stuck there until the wait interval expires, or another notify() call is received.

I think the test can be made to run reliably by using a volatile boolean to skip the wait along with the notify.

  was:
TestBackgroundPipelineScrubber.testRun() fails intermittently for me locally, with this trace:

{code}
2022-04-27 17:20:44,888 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:start(123)) - Starting Pipeline Scrubber Service.
2022-04-27 17:20:44,890 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:notifyStatusChanged(85)) - Service BackgroundPipelineScrubber transitions to RUNNING.
2022-04-27 17:20:47,905 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:stop(140)) - Stopping Pipeline Scrubber Service.
2022-04-27 17:20:47,905 [PipelineScrubberThread] WARN  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:run(158)) - PipelineScrubberThread is interrupted, exit


Wanted but not invoked:
pipelineManager.scrubPipelines();
-> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
Actually, there were zero interactions with this mock.

Wanted but not invoked:
pipelineManager.scrubPipelines();
-> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
Actually, there were zero interactions with this mock.
{code}

I believe the reason is that when the `notifyStatusChanged()` method is called, the thread may be running and not waiting.

Then calling notifyAll() does not do anything. The thread will fall into the wait and stay stuck there until the wait interval expires, or another notify() call is received.

After the safemode interval expires, and we received the notifyStatusChanged() call in BackgroundPipelineScrubber - do we want the processing thread to wake up and run immediately? If not, it could potentially sleep for its wait interval after the event is received, which means the delay is actually safemode_interval + thread_wait_interval/

I think we need another volatile boolean in `notifyStatusChanged()`, called runImmediately. Set it to true and then call notify:

{code}
  synchronized(this) {
    runImmediately = true;
    notify();
  }
{code}

Then in the run loop:

{code}
  synchronized (this) {
    if (!runImmediately) {
      wait(intervalInMillis);
    }
    runImmediately = false;
  }
{code}

This should handle the case where the thread waits just before the notify and I believe will fix the test too.



> TestBackgroundPipelineScrubber.testRun() fails intermittently
> -------------------------------------------------------------
>
>                 Key: HDDS-6661
>                 URL: https://issues.apache.org/jira/browse/HDDS-6661
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> TestBackgroundPipelineScrubber.testRun() fails intermittently for me locally, with this trace:
> {code}
> 2022-04-27 17:20:44,888 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:start(123)) - Starting Pipeline Scrubber Service.
> 2022-04-27 17:20:44,890 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:notifyStatusChanged(85)) - Service BackgroundPipelineScrubber transitions to RUNNING.
> 2022-04-27 17:20:47,905 [main] INFO  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:stop(140)) - Stopping Pipeline Scrubber Service.
> 2022-04-27 17:20:47,905 [PipelineScrubberThread] WARN  pipeline.BackgroundPipelineScrubber (BackgroundPipelineScrubber.java:run(158)) - PipelineScrubberThread is interrupted, exit
> Wanted but not invoked:
> pipelineManager.scrubPipelines();
> -> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
> Actually, there were zero interactions with this mock.
> Wanted but not invoked:
> pipelineManager.scrubPipelines();
> -> at org.apache.hadoop.hdds.scm.pipeline.TestBackgroundPipelineScrubber.testRun(TestBackgroundPipelineScrubber.java:97)
> Actually, there were zero interactions with this mock.
> {code}
> I believe the reason is that when the `notifyStatusChanged()` method is called, the thread may be running and not waiting.
> Then calling notifyAll() does not do anything. The thread will fall into the wait and stay stuck there until the wait interval expires, or another notify() call is received.
> I think the test can be made to run reliably by using a volatile boolean to skip the wait along with the notify.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org