You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/04/22 09:21:30 UTC

[GitHub] [hadoop] tomscut opened a new pull request, #4219: HDFS-16557. BootstrapStandby failed because of checking Gap for inprogress EditLogInputStream

tomscut opened a new pull request, #4219:
URL: https://github.com/apache/hadoop/pull/4219

   JIRA: HDFS-16557.
   
   The lastTxId of an inprogress EditLogInputStream lastTxId isn't necessarily HdfsServerConstants.INVALID_TXID. We can determine its status directly by EditLogInputStream#isInProgress.
   
   For example, when bootstrapStandby, the EditLogInputStream of inProgress is misjudged, resulting in a gap check failure, which causes bootstrapStandby to fail.
   
   ![image](https://user-images.githubusercontent.com/55134131/164676951-686f46ae-9b89-4be8-8d3c-41a08bb432ae.png)
   ![image](https://user-images.githubusercontent.com/55134131/164676977-bd3ece9d-3ffc-406f-8c06-aacdeac0dee8.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1113895505

   Hi @xkrogen , to make the change safe, we can change the condition from:
   `if(next == HdfsServerConstants.INVALID_TXID)`
   to
   `if(next == HdfsServerConstants.INVALID_TXID || elis.isInProgress())`
   
   Do you think it's necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1134096700

   Thanks @tomscut for your report.  Similar with [HDFS-14806](https://issues.apache.org/jira/browse/HDFS-14806) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1149418728

   Hi @xkrogen , if you have enough bandwidth, please take a look. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151885068

   Oh, i know, the root cause is that getJournaledEdits returns up to 5000 txids by default. And 1049842441 - 1049837441 = 5000.
   
   I can't reached to 1050196644, so checkForGaps failed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1140184555

   Hi @ayushtkn , could you please also take a look at this. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1133511045

   Hi @xkrogen , please take a look if you have enough bandwidth. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151877997

   OK, back to BootstrapStandby GAP.
   Form this stack information, I got that it try to get streams from 1049842441 to 1050196644. But cannot get the txid 1049842441 from the result streams. 
   So I think we should to trace the root cause,  why can't we find txid 1049842441 in the return result of `selectInputStreams(streams, 1049842441, true, true)`? 
   
   Please correct me if anything is wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151901548

   > When we set dfs.ha.tail-edits.in-progress=true, the edits can be read by getJournaledEdits (there is no gap actually) . But there is an GAP exception thrown.
   
   I  think there is a gap here because bootstrap expects to get 1050196644 txid, but can't find it in the result. So throwing GAP Exception is ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1110588939

   Hi @ayushtkn @Hexiaoqiao @ferhui , could you please also take a look? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
hadoop-yetus commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1133564345

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 37s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  5s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 10s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 256m 13s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 12s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 365m 17s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
   |   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/3/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4219 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 99d71330e64c 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7a37d9572f827a1af00a75ac93c4874a40c3eb07 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/3/testReport/ |
   | Max. process+thread count | 3850 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1150668804

   Thanks @tomscut , after tracing the code, I think we cannot add `elis.isInProgress()`.
   
   And I will explain my ideas trough questions and answers. 
   **Question one: Why was INVALID_TXID considered in the original code?**
   - CheckForGaps method is used to check whether streams contains continuous TXids from fromTxId to toAtLeastTxid
   - LastTxId equals INVALID_TXID means the stream is in progress
   - toAtLeastTxid maybe abnormal value, like Long.MaxValue.  So the CheckForGaps method only need to cover the latest inprogress segment.
   
   **Question two: What is the difference between INVALID_TXID and is InProgress()?**
   - Before introducing [SBN READ], LastTxId equals INVALID_TXID means the stream is in progress. And stream is in progress means it's lastTxId is INVALID_TXID.
   - But after introducing [SBN READ], LastTxId equals INVALID_TXID means the stream is in progress. But stream is in progress cannot mean it's lastTxId is INVALID_TXID. Because introducing getJournaledEdits.
   - So if we add `elis.isInProgress()` in CheckForGaps, it cannot cover the last writing segments which actual contains latest edit.
   
   Please correct me if anything is wrong.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151882369

   > OK, back to BootstrapStandby GAP. Form this stack information, I got that it try to get streams from 1049842441 to 1050196644. But cannot get the txid 1049842441 from the result streams. So I think we should to trace the root cause, why can't we find txid 1049842441 in the return result of `selectInputStreams(streams, 1049842441, true, true)`?
   > 
   > Please correct me if anything is wrong.
   
   Please refer to the discussion with @xkrogen above. 
   
   The root cause is the` if` condition (`if(next == HdfsServerConstants.INVALID_TXID)`) that does not enter properly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151890746

   As I explained above, change to `if (next == HdfsServerConstants.INVALID_TXID || elis.isInProgress())` maybe change the original semantics of the `checkgap` method.
   
   About my explain, do you have any questions?😁  Discuss together and become more familiar with the relevant logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1148120968

   Hi @jojochuang @tasanuma @Hexiaoqiao , could you please also take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on a diff in pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on code in PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#discussion_r862281008


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java:
##########
@@ -1792,7 +1792,7 @@ private void checkForGaps(List<EditLogInputStream> streams, long fromTxId,
       EditLogInputStream elis = iter.next();
       if (elis.getFirstTxId() > txId) break;
       long next = elis.getLastTxId();

Review Comment:
   Thanks @xkrogen for you suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ZanderXu commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
ZanderXu commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151887486

   So in this case, we should change bootstrap logic.
   Solution one: set DFS_HA_TAILEDITS_INPROGRESS_KEY to false.
   Solution two: call getJournaledEdits multiple times until get the latest txid, and then go to checkgap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1107679485

   Hi @tasanuma @xkrogen @sunchao , could you please have a look? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1113892494

   > This seems right to me, but I don't fully understand what went wrong to cause the error. Can you explain more fully? Why did we previously make the assumption that `INVALID_TXID` meant in-progress, and what has changed to make that not true / what happened in your specific scenario to cause that not to be true?
   
   Thank you @xkrogen very much  for your review.
   
   After introducing [SBN READ], we updated the configuration: `dfs.ha.tail-edits.in-progress=true`.
   
   Then when we `bootstrapStandby`, we will encounter something like this:
   1. We need to start an Observer Namenode, so we execute bootstrapStandby before start it. This will automatically pull the latest FSImage from the Active Namenode and check whether the edits in the journals has a gap based on the `lastTxid` of the FSImage.
   
   2. Assume that the txid of the latest FSImage is x, and editslogs from x in journals is in `InProgress` state, `FSEditLog#checkForGaps` will be skipped. Because the `lastTxid` of the InProgress EditLogInputStream is not `HdfsServerConstants.INVALID_TXID`, but a specific number.  
   
   3. However, between x and txID currently being written, there is finalize Edit log, and `bootstrapStandby` can execute normally.
   
   The `lastTxId` of an InProgress EditLogInputStream isn't always as `HdfsServerConstants.INVALID_TXID`, could also be a specific number.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1134494753

   > Thanks @tomscut for your report. Similar with [HDFS-14806](https://issues.apache.org/jira/browse/HDFS-14806) ?
   
   Thanks @ZanderXu for your comments. Setting DFS_HA_TAILEDITS_INPROGRESS_KEY to false could solve the problem. But if we correct the logic of judging in-progress EditLogInputStream, it seems more reasonable. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1151852402

   > Thanks @tomscut , after tracing the code, I think we cannot add `elis.isInProgress()`.
   > 
   > And I will explain my ideas trough questions and answers. **Question one: Why was INVALID_TXID considered in the original code?**
   > 
   > * CheckForGaps method is used to check whether streams contains continuous TXids from fromTxId to toAtLeastTxid
   > * LastTxId equals INVALID_TXID means the stream is in progress
   > * toAtLeastTxid maybe abnormal value, like Long.MaxValue.  So the CheckForGaps method only need to cover the latest inprogress segment.
   > 
   > **Question two: What is the difference between INVALID_TXID and is InProgress()?**
   > 
   > * Before introducing [SBN READ], LastTxId equals INVALID_TXID means the stream is in progress. And stream is in progress means it's lastTxId is INVALID_TXID.
   > * But after introducing [SBN READ], LastTxId equals INVALID_TXID means the stream is in progress. But stream is in progress cannot mean it's lastTxId is INVALID_TXID. Because introducing getJournaledEdits.
   > * So if we add `elis.isInProgress()` in CheckForGaps, it cannot cover the last writing segments which actual contains latest edit.
   > 
   > Please correct me if anything is wrong.
   
   Thanks @ZanderXu for your comment. Please refer to the stack.
   ![image](https://user-images.githubusercontent.com/55134131/172977547-16c0bf94-8586-4f41-be8e-ce1e4dd41eae.png)
   
   When we set `dfs.ha.tail-edits.in-progress=true`, the txID can be read by getJournaledEdits (there is no gap actually) . But there is an GAP exception thrown.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] tomscut commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1107679378

   Thanks @ashutoshcipher for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
hadoop-yetus commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1106630776

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  12m 53s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 37s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 10s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 40s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 248m 17s |  |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 13s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 371m  9s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4219 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 762330a41f0c 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / abd27315b22d09607203cbe3ff1fbb9ed8b47ca2 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/1/testReport/ |
   | Max. process+thread count | 3553 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
hadoop-yetus commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1113949271

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  12m 59s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  3s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 16s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 247m 28s |  |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 368m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4219 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e1b95ddc93cd 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6e602e9e05a17e8ee69124dec08a81a96eab627e |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/2/testReport/ |
   | Max. process+thread count | 3067 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4219/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] xkrogen commented on a diff in pull request #4219: HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream

Posted by GitBox <gi...@apache.org>.
xkrogen commented on code in PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#discussion_r862193291


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java:
##########
@@ -1792,7 +1792,7 @@ private void checkForGaps(List<EditLogInputStream> streams, long fromTxId,
       EditLogInputStream elis = iter.next();
       if (elis.getFirstTxId() > txId) break;
       long next = elis.getLastTxId();

Review Comment:
   The local variable is redundant now, we can just update L1805 to be:
   ```java
   txId = elis.getLastTxId() + 1;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org