You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "ConfX (Jira)" <ji...@apache.org> on 2023/07/18 16:44:00 UTC

[jira] [Created] (HDFS-17102) Timeout encountered when running TestDataNodeOutlierDetectionViaMetrics

ConfX created HDFS-17102:
----------------------------

             Summary: Timeout encountered when running TestDataNodeOutlierDetectionViaMetrics
                 Key: HDFS-17102
                 URL: https://issues.apache.org/jira/browse/HDFS-17102
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened:

Got a timeout when running {{TestDataNodeOutlierDetectionViaMetrics}} and setting min outlier to 0 or negative.
h2. Where's the bug:

In {{TestDataNodeOutlierDetectionViaMetrics.injectFastNodesSamples}} the test injects several packets into the nodes:
{noformat}
      for (int i = 0;
           i < 2 * peerMetrics.getMinOutlierDetectionSamples();
           ++i) {
        peerMetrics.addSendPacketDownstream(
            nodeName, random.nextInt(FAST_NODE_MAX_LATENCY_MS));
      }{noformat}
A similar logic appears in the {{{}injectSlowNodesSamples{}}}. A problem with this code is that if {{dfs.datanode.peer.metrics.min.outlier.detection.samples}} is set to negative or 0, no packet would be injected and the {{waitFor}} later:
{noformat}
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      @Override
      public Boolean get() {
        return peerMetrics.getOutliers().size() > 0;
      }
    }, 500, 100_000);{noformat}
would keeping waiting until timeout.
h2. How to reproduce:

(1) Set {{dfs.datanode.peer.metrics.min.outlier.detection.samples }} to {{0}}
(2) Run test: {{org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics#testOutlierIsDetected}}
h2. Stacktrace:

 
{noformat}
java.util.concurrent.TimeoutException:
Timed out waiting for condition.
Thread diagnostics:
Timestamp: 2023-07-04 04:08:54,535
"Reference Handler" daemon prio=10 tid=2 runnable
java.lang.Thread.State: RUNNABLE
        at java.base@11.0.18/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
        at java.base@11.0.18/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
        at java.base@11.0.18/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)
"surefire-forkedjvm-command-thread" daemon prio=5 tid=23 runnable
java.lang.Thread.State: RUNNABLE
...
{noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org