You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/06 07:44:00 UTC

[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

    [ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613351#comment-17613351 ] 

ASF GitHub Bot commented on HDFS-16689:
---------------------------------------

hadoop-yetus commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1269507559

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 4 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m  0s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 56s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 28s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 284 unchanged - 1 fixed = 284 total (was 285)  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 45s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 245m 21s |  |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 359m 34s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/14/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4744 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 942c597e4795 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8cd18a15b892f6b949d24edfab335255a28f5eee |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/14/testReport/ |
   | Max. process+thread count | 3468 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/14/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16689
>                 URL: https://issues.apache.org/jira/browse/HDFS-16689
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X when there is a stream available for read: ByteStringEditLog[X, Y], ByteStringEditLog[X, 0]
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
> 	... 36 more
> {code}
> After tracing and found there is a critical bug in *EditlogTailer#catchupDuringFailover()* when *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with *onlyDurableTxns=false* , so that it can replay all missed edits from JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org