You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "tomscut (Jira)" <ji...@apache.org> on 2021/12/11 02:45:00 UTC

[jira] [Created] (HDFS-16379) Reset fullBlockReportLeaseId after any exceptions

tomscut created HDFS-16379:
------------------------------

Summary: Reset fullBlockReportLeaseId after any exceptions
Key: HDFS-16379
URL: https://issues.apache.org/jira/browse/HDFS-16379
Project: Hadoop HDFS
Issue Type: Bug
Reporter: tomscut
Assignee: tomscut

Recently we encountered FBR-related problems in the production environment, which were solved by introducing HDFS-12914 and HDFS-14314.

But there may be situations like this:
1 DN got *fullBlockReportLeaseId* via heartbeat.

2 DN trigger a blockReport, but some exception occurs (this may be rare, but it may exist), and then DN does multiple retries {*}without resetting leaseID{*}. Because leaseID is reset only if it succeeds currently.

3 After a while, the exception is cleared, but the LeaseID has expired. *Since NN did not throw an exception after the lease expired, the DN considered that the blockReport was successful.* So the blockReport was not actually executed this time and needs to wait until the next time.

Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the finally block{*}? The advantage of this is that lease expiration can be avoided. The downside is that each heartbeat will apply for a new fullBlockReportLeaseId during the exception, but I think this cost is negligible.

--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org