You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Max Mizikar (Jira)" <ji...@apache.org> on 2020/06/18 13:57:00 UTC

[jira] [Created] (HDFS-15420) approx scheduled blocks not reseting over time

Max Mizikar created HDFS-15420:
----------------------------------

             Summary: approx scheduled blocks not reseting over time
                 Key: HDFS-15420
                 URL: https://issues.apache.org/jira/browse/HDFS-15420
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: block placement
    Affects Versions: 3.0.0, 2.6.0
         Environment: Our 2.6.0 environment is a 3 node cluster running cdh5.15.0.
Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
            Reporter: Max Mizikar
         Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 2020-06-18 09-31-15.png

We have been experiencing large amounts of scheduled blocks that never get cleared out. This is preventing blocks from being placed even when there is plenty of space on the system.
Here is an example of the block growth over 24 hours on one of our systems running 2.6.0
 !Screenshot from 2020-06-18 09-29-57.png! 
Here is an example of the block growth over 24 hours on one of our systems running 3.0.0
 !Screenshot from 2020-06-18 09-31-15.png! 
https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, however, there appears to still be a systemic growth in scheduled blocks over time and our systems will still need to restart the namenode on occasion to reset this count. I have not determined what is causing the leaked blocks in 3.0.0.

Looking into the issue, I discovered that the intention is for scheduled blocks to slowly go back down to 0 after errors cause blocks to be leaked.
{code}
  /** Increment the number of blocks scheduled. */
  void incrementBlocksScheduled(StorageType t) {
    currApproxBlocksScheduled.add(t, 1);
  }
  
  /** Decrement the number of blocks scheduled. */
  void decrementBlocksScheduled(StorageType t) {
    if (prevApproxBlocksScheduled.get(t) > 0) {
      prevApproxBlocksScheduled.subtract(t, 1);
    } else if (currApproxBlocksScheduled.get(t) > 0) {
      currApproxBlocksScheduled.subtract(t, 1);
    } 
    // its ok if both counters are zero.
  }
  
  /** Adjusts curr and prev number of blocks scheduled every few minutes. */
  private void rollBlocksScheduled(long now) {
    if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
      prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
      currApproxBlocksScheduled.reset();
      lastBlocksScheduledRollTime = now;
    }
  }
{code}

However, this code does not do what is intended if the system has a constant flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the next scheduled block increments currApproxBlocksScheduled and when it completes, it decrements prevApproxBlocksScheduled preventing the leaked block to be removed from the approx count. So, for errors to be corrected, we have to not write any data for the roll period of 10 minutes. The number of blocks we write per 10 minutes is quite high. This allows the error on the approx counts to grow to very large numbers.

The comments in the ticket for the original implementation suggest this issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, it's not clear to me if the severity of it was known at the time.
> So if there are some blocks that are not reported back by the datanode, they will eventually get adjusted (usually 10 min; bit longer if datanode is continuously receiving blocks).
The comments suggest it will eventually get cleared out, but in our case, it never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org