You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chunhui shen (Created) (JIRA)" <ji...@apache.org> on 2012/03/31 06:58:43 UTC

[jira] [Created] (HBASE-5689) Skip RecoveredEdits may cause data loss

Skip RecoveredEdits may cause data loss
---------------------------------------

                 Key: HBASE-5689
                 URL: https://issues.apache.org/jira/browse/HBASE-5689
             Project: HBase
          Issue Type: Bug
          Components: regionserver
            Reporter: chunhui shen
            Assignee: chunhui shen


Let's see the following scenario:
1.Region is on the server A
2.put KV(r1->v1) to the region
3.move region from server A to server B
4.put KV(r2->v2) to the region
5.move region from server B to server A
6.put KV(r3->v3) to the region
7.kill -9 server B and start it
8.kill -9 server A and start it 
9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.

Let's analyse the upper scenario from the code:
1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
3.however, RecoveredEdits file f2 will be skiped when initializing region
HRegion#replayRecoveredEditsIfAny
{code}
 for (Path edits: files) {
      if (edits == null || !this.fs.exists(edits)) {
        LOG.warn("Null or non-existent edits file: " + edits);
        continue;
      }
      if (isZeroLengthThenDelete(this.fs, edits)) continue;

      if (checkSafeToSkip) {
        Path higher = files.higher(edits);
        long maxSeqId = Long.MAX_VALUE;
        if (higher != null) {
          // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
          String fileName = higher.getName();
          maxSeqId = Math.abs(Long.parseLong(fileName));
        }
        if (maxSeqId <= minSeqId) {
          String msg = "Maximum possible sequenceid for this log is " + maxSeqId
              + ", skipped the whole file, path=" + edits;
          LOG.debug(msg);
          continue;
        } else {
          checkSafeToSkip = false;
        }
      }
{code}

 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-5689:
--------------------------------

    Attachment: 5689-testcase.patch

I have written a test case for the issue to make a data loss scenario when skip RecoveredEdits.
                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244719#comment-13244719 ] 

Lars Hofhansl commented on HBASE-5689:
--------------------------------------

+1 on patch.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243057#comment-13243057 ] 

ramkrishna.s.vasudevan edited comment on HBASE-5689 at 3/31/12 6:53 AM:
------------------------------------------------------------------------

@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to understand more on what is happening there.  
                
      was (Author: ram_krish):
    @Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand more on what is happening there.  
                  
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Summary: Skipping RecoveredEdits may cause data loss  (was: Skip RecoveredEdits may cause data loss)
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249285#comment-13249285 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

Integrated patch v3 to 0.94 and trunk.

Thanks for the patch Chunhui.

Thanks for the review, Ram, Lars, Stack and Ming.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Attachment: HBASE-5689v3.patch

Re-attaching patch v3 for hadoop QA.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243057#comment-13243057 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
-----------------------------------------------

@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand more on what is happening there.  
                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243154#comment-13243154 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

>From the above analysis, it looks like this bug was caused by the optimization made in HBASE-4797.
                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Ming Ma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247960#comment-13247960 ] 

Ming Ma commented on HBASE-5689:
--------------------------------

Really nice finding and fix. Some comments regarding performance:

1. replayRecoveredEditsIfAny still sorts and uses the sequence id of next recoverededit file as the max sequence id of the current recoverededit file. With this fix, is that still necessary? replayRecoveredEditsIfAny can simply walk through all the recoverededit files and skip those files that have "max sequence id" <= "current sequence id of the region". 
2. If we have multiple recoverededit files for a given region, is it safe to say that "In maximum, there could only be one recoverededit file whose max sequence id is larger than the current sequence id of the region"? If yes, replayRecoveredEditsIfAny can skip the rest of recoverededit files once it detects that special recoverededit file.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243743#comment-13243743 ] 

Ted Yu commented on HBASE-5689:
-------------------------------

Using a TreeMap is common practice.

Please attach test suite result - Hadoop QA is not working.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243625#comment-13243625 ] 

chunhui shen commented on HBASE-5689:
-------------------------------------

@Ram
The analysis is correct.
However, I think your solution is not use the optimization made in HBASE-4797.
In my  HBASE-5689.patch, I just change the file name of edit log to MaximumEditLogSeqNum。
In this issue, with the patch it will not skip RecoveredEdits file because we shouldn't, however, in other case, for example, we first put many data to RS1, then move regon to RS2 and put many data again. If RS1 and RS2 died, we would skip the edit log file from RS1.

bq.Chunhui's patch also makes sense but any way the idea is not to skip any recovered.edits file.
So, it is wrong, my way is keeping skip recovered.edits file, except the case such as this issue.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-5689:
--------------------------------

    Attachment: HBASE-5689v3.patch

In the patchV3, append the optimize suggested by Ming Ma :replayRecoveredEditsIfAny simply walk through all the recoverededit files and skip those files that have "max sequence id" <= "current sequence id of the region".
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248033#comment-13248033 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

Consider my comment above and the following method in HLogSplitter:
{code}
  static String formatRecoveredEditsFileName(final long seqid) {
    return String.format("%019d", seqid);
  }
{code}
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5689:
---------------------------------

    Fix Version/s: 0.94.0
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243109#comment-13243109 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
-----------------------------------------------

I just want to understand here like removing the 
{code}
      if (maxSeqId <= minSeqId) {
          String msg = "Maximum possible sequenceid for this log is " + maxSeqId
              + ", skipped the whole file, path=" + edits;
          LOG.debug(msg);
          continue;
{code}
'continue' here should solve the problem.  Inside replayRecoveredEdits we have this code
{code}
         // Now, figure if we should skip this edit.
          if (key.getLogSeqNum() <= currentEditSeqId) {
            skippedEdits++;
            continue;
          }
{code}
which will any way skip unnecessary edits.  I tried by commenting the 'continue' in the code and the test case that you gave passed.  It is my thought, your comments are welcome.
                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Attachment: 5689-v4.txt

Patch v4 removes Math.abs() call.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-5689:
--------------------------------

    Attachment: HBASE-5689v2.patch

Sorry for the later response.

In the patchV2, I add some doc for getCompletedRecoveredEditsFilePath() as the comment by Stack. All the other is same with patchV1.

Please review again.
Thanks!
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247018#comment-13247018 ] 

Lars Hofhansl commented on HBASE-5689:
--------------------------------------

@chunhui: Could you respond to Stack's last comment? I would like to get 0.94.0RC1 out soon.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247993#comment-13247993 ] 

chunhui shen commented on HBASE-5689:
-------------------------------------

bq.The Math.abs() call is not needed, right ?

I think so, but I don't know why Math.abs() is needed before.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247017#comment-13247017 ] 

Lars Hofhansl commented on HBASE-5689:
--------------------------------------

All except
org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable and org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTableWithCustomMapper fail with the known NumberFormatException.

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Affects Version/s: 0.94.0
    
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246748#comment-13246748 ] 

Hadoop QA commented on HBASE-5689:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521382/HBASE-5689.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1390//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1390//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1390//console

This message is automatically generated.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249476#comment-13249476 ] 

Hudson commented on HBASE-5689:
-------------------------------

Integrated in HBase-TRUNK-security #162 (See [https://builds.apache.org/job/HBase-TRUNK-security/162/])
    HBASE-5689 Skipping RecoveredEdits may cause data loss (Chunhui) (Revision 1310788)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Attachment:     (was: 5689-simplified.txt)
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253623#comment-13253623 ] 

Hudson commented on HBASE-5689:
-------------------------------

Integrated in HBase-0.94-security #9 (See [https://builds.apache.org/job/HBase-0.94-security/9/])
    HBASE-5689 Skipping RecoveredEdits may cause data loss (Chunhui) (Revision 1310787)

     Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243070#comment-13243070 ] 

chunhui shen commented on HBASE-5689:
-------------------------------------

rather than the current MinimumEditLogSeqNum as the file name.

                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244418#comment-13244418 ] 

stack commented on HBASE-5689:
------------------------------

Good one Chunhui.  I think the patch good.

Nice reproduction of the problem in a test.  Where in the test do you find that we've lost the third edit?

So we name the file when we write it for its first edit, then when we move it into place, we rename it to be by last edit in the file?  Add a comment to that effect I'd say else could be confusing.  Hmm... I suppose you have it here on the doc for getCompletedRecoveredEditsFilePath.  Thats probably good enough.. but no harm explaining why we go from naming file w/ first edit to instead name it for the last edit.


                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248018#comment-13248018 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

Please take a look at the updated javadoc for getCompletedRecoveredEditsFilePath():
{code}
+   * Get the completed recovered edits file path, renaming it to be by last edit
+   * in the file from its first edit. Then we could use the name to skip
+   * recovered edits when doing {@link HRegion#replayRecoveredEditsIfAny}.
{code}
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247986#comment-13247986 ] 

Hadoop QA commented on HBASE-5689:
----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521627/HBASE-5689v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1419//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1419//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1419//console

This message is automatically generated.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248013#comment-13248013 ] 

Lars Hofhansl commented on HBASE-5689:
--------------------------------------

Thanks Chunhui. Patch looks good to me. +1
Re: Math.abs(), the comment in the code says that the file name pattern is -?[0-9]+, so negative is possible?
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243675#comment-13243675 ] 

chunhui shen commented on HBASE-5689:
-------------------------------------

bq.Is a TreeMap needed above ? We're just remembering the mapping, right ?
I first used ConcurrentHashMap, but for the same region name, they mapped to different values because of byte[]
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249283#comment-13249283 ] 

Hadoop QA commented on HBASE-5689:
----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521820/HBASE-5689v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1445//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1445//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1445//console

This message is automatically generated.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-5689:
--------------------------------

    Attachment: HBASE-5689.patch

In the patch,
I make the region's MaximumEditLogSeqNum in the RecoveredEdit file as the file name.
                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5689:
---------------------------------

    Attachment: HBASE-5689.patch

Reattaching for test run.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243983#comment-13243983 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
-----------------------------------------------

@Chunhui
+1 on patch Chunhui.  I was bit hesitant to change that main logic.  But after checking different scenarios this change is necessary to have both performance and to address this JIRA.  Good one.  Thanks a lot. :)
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5689:
---------------------------------

    Status: Patch Available  (was: Open)
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243648#comment-13243648 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

Some minor comments about Chunhui's patch:
{code}
+          LOG.debug("Rename " + wap.p + " to " + dst);
{code}
The log should begin with "Renamed ".
{code}
+    private final Map<byte[], Long> regionMaximumEditLogSeqNum = Collections
+        .synchronizedMap(new TreeMap<byte[], Long>(Bytes.BYTES_COMPARATOR));
{code}
Is a TreeMap needed above ? We're just remembering the mapping, right ?
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243204#comment-13243204 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
-----------------------------------------------

@Ted
What do you feel Ted? Chunhui's patch also makes sense but any way the idea is not to skip any recovered.edits file.  So i felt removing the check will be enough.

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248421#comment-13248421 ] 

stack commented on HBASE-5689:
------------------------------

@Ming (Hey, long time no hear from you).

On 1., yes, sounds good.  I do not think 2. holds.  It is more robust replaying any file that is > 'current max seqid of the region' at least until we do more thinking on this topic.  The WAL split can be messy or just as bad, we can be inventive around how we might split WALs and how we might write files to replay. 
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-5689:
--------------------------

    Attachment: HBASE-5689.patch

Re-attaching Chunhui's patch
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248031#comment-13248031 ] 

Hadoop QA commented on HBASE-5689:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521635/HBASE-5689v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1422//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1422//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1422//console

This message is automatically generated.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247972#comment-13247972 ] 

chunhui shen commented on HBASE-5689:
-------------------------------------

@Ming
I like the optimize that we simply walk through all the recoverededit files and skip those files that have "max sequence id" <= "current sequence id of the region".

For the second point, I think it's right and the first optimize is enough and  simle to understand
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244233#comment-13244233 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

@Chunhui:
Hadoop QA isn't picking up any patches from this JIRA.

Please run through test suite and let us know the result.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247992#comment-13247992 ] 

Zhihong Yu commented on HBASE-5689:
-----------------------------------

For patch v3:
{code}
+      maxSeqId = Math.abs(Long.parseLong(fileName));
{code}
The Math.abs() call is not needed, right ?
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243057#comment-13243057 ] 

Zhihong Yu edited comment on HBASE-5689 at 3/31/12 2:00 PM:
------------------------------------------------------------

@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just trying to understand more on what is happening there.  
                
      was (Author: ram_krish):
    @Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to understand more on what is happening there.  
                  
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5689:
------------------------------------------

    Priority: Critical  (was: Major)

Making it critical as it is data loss related.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248029#comment-13248029 ] 

Lars Hofhansl commented on HBASE-5689:
--------------------------------------

How do you know it's safe to remove Math.abs()?

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248457#comment-13248457 ] 

stack commented on HBASE-5689:
------------------------------

Looking at the patch, it looks good to me.

On the Math.abs, its probably not needed any more given current file formats but its not doing any harm.  I tried to go back through src history to see how we used name the recovered edits files -- they used be called stuff like oldlogfile.log, etc -- and they have been named for first seqid in the file since 0.90 so its unlikely this will ever go negative or that this code will encouter a file named negatively.

I also thought that perhaps this patch only for trunk because what about the case where we have existing wal files that are still named by the first seqid in the file?  But this should be fine.  We'll over-replay rather than under-replay edits in this case so it should be fine.

+1 on commit adding back the Math.abs on commit (though reasoning would have it we don't need, I think we should add it back just-in-case).
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248058#comment-13248058 ] 

Hadoop QA commented on HBASE-5689:
----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521642/5689-v4.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1425//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1425//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1425//console

This message is automatically generated.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Attachment: 5689-simplified.txt

Removing the check should be enough.

With the attached patch, TestHRegion#testDataCorrectnessReplayingRecoveredEdits passes.
                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243106#comment-13243106 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
-----------------------------------------------

@Chunhui
Thanks for the patch.
I would like to tell my analysis in this
-> Suppose the current  seq for RS 1 is 4
When the first row kv was inserted KV(r1->v1), the current seq is 5.
On moving the region to RS2 the store gets flushed and when RS2 opens the region the next seq he can use will be 6.
Now we make the next kv entry KV(r2->v1), now the current seq for this entry is 6. Now when the region is moved again to RS1, another store file is created by RS2.
Now when RS1 opens the region the seq number which he can use will be 7.

We now add an entry KV(r3->v1) again in RS1 so it will have 7 in it (in WAL).

Kill the RS2 first.  This will create a recovered.edits with file name 000006.

Kill RS1.   This will create a recovered.edits with file name 00005.

Now when the region is finally opened in a new RS i will be having the 2 store files and the max seq id from them will be 6.  Now the recovered.edits will also give me 6 as highest seq.  
{code}
      if (maxSeqId <= minSeqId) {
          String msg = "Maximum possible sequenceid for this log is " + maxSeqId
              + ", skipped the whole file, path=" + edits;
          LOG.debug(msg);
          continue;
{code}
Correct me my analysis is wrong.


                
> Skip RecoveredEdits may cause data loss
> ---------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249293#comment-13249293 ] 

Hudson commented on HBASE-5689:
-------------------------------

Integrated in HBase-TRUNK #2726 (See [https://builds.apache.org/job/HBase-TRUNK/2726/])
    HBASE-5689 Skipping RecoveredEdits may cause data loss (Chunhui) (Revision 1310788)

     Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249295#comment-13249295 ] 

Hudson commented on HBASE-5689:
-------------------------------

Integrated in HBase-0.94 #97 (See [https://builds.apache.org/job/HBase-0.94/97/])
    HBASE-5689 Skipping RecoveredEdits may cause data loss (Chunhui) (Revision 1310787)

     Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java

                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-testcase.patch, 5689-v4.txt, HBASE-5689.patch, HBASE-5689.patch, HBASE-5689v2.patch, HBASE-5689v3.patch, HBASE-5689v3.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5689:
------------------------------

    Attachment:     (was: HBASE-5689.patch)
    
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira