You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chunhui shen (Created) (JIRA)" <ji...@apache.org> on 2011/11/24 07:46:40 UTC

[jira] [Created] (HBASE-4862) Split hlog and open region currently happend may cause data loss

Split hlog and open region currently happend may cause data loss
----------------------------------------------------------------

                 Key: HBASE-4862
                 URL: https://issues.apache.org/jira/browse/HBASE-4862
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.2
            Reporter: chunhui shen


Case Description:
1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
3.Split hlog thread catches the io exception, and stop parse this log file 
and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!

The case may happen in the following:
1.Move region from server A to server B
2.kill server A and Server B
3.restart server A and Server B

We could prevent this exception throuth forbiding deleting  recover.edits file 
which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4862:
------------------------------------------

    Fix Version/s: 0.92.0
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156965#comment-13156965 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Ted Yu @Todd Lipcon

It will happen concurrently in the following case:
1.Move region from server A to server B (for example,do balance)
2.kill server A and Server B
3.restart server A and Server B immediately

Before we restart server A and Server B, log data about this region appear in the both server's log file,
4.After we restart server B, serverShutdownHandler process this dead server , and assign this region,
5.At the same time, serverShutdownHandler would process dead server B, and split server B's hlog
because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log entry for this region's
recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by replayRecoveredEdits, exception is thrown.

master and region server log in this case as the following:

master log: 
2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 File does not exist. [Lease. Holder: DFSClient_hb_m_dw75.kgb.sqa.cm4:60000_1321413286871, pendingcreates: 54] 
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542) 
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533) 
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) 
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
        at java.lang.reflect.Method.invoke(Method.java:597) 
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) 
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) 
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) 
        at java.security.AccessController.doPrivileged(Native Method) 
        at javax.security.auth.Subject.doAs(Subject.java:396) 
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) 
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) 

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) 
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49) 
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) 
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962) 
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926) 
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898) 



regionserver log: 
2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Deleted recovered.edits file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862-v6-90.txt
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157385#comment-13157385 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

@Todd:
Do you need more details from Chunhui ?

Thanks
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156765#comment-13156765 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Nice work.
The patch doesn't apply to 0.90 branch:
{code}
Hunk #4 succeeded at 783 (offset -332 lines).
1 out of 4 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java.rej
...
patch unexpectedly ends in middle of line
2 out of 2 hunks ignored -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
{code}
Please rebase your patch for 0.90

A separate patch for TRUNK would be helpful for HadoopQA to run test suite.

Comments about the changes:
getTmpRecoveredEditsFileName() is only used once and there is no javadoc for it. Maybe we don't need to create the method, just append ".tmp" directly to the filename.
{code}
+    // Convert file name ends with .tmp, so ensure region's replayRecoveredEdits
{code}
The beginning of the above should read 'Append filename with '.tmp' to ensure'
                
> Split hlog and open region concurrently happend may cause data loss
> -------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v7fortrunk.patch
                hbase-4862v7for0.90.patch

Based on patchV6,update javadoc of HLog#getSplitEditFilesSorted
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157339#comment-13157339 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

I could run test suite by executing 'mvn test' on MacBook.
PreCommit builds 371 and 373 didn't run any tests.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v3fortrunk.diff
                hbase-4862v3for0.90.diff
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157680#comment-13157680 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

{code}
        // Skip the test which creates a splitter that reads and writes the
        // data without touching disk. testThreading#TestHLogSplit .etc
        if (fs.exists(wap.p)) {
{code}
The javadoc should read:
{code}
        // Skip the unit tests which create a splitter that reads and writes the
        // data without touching disk. TestHLogSplit#testThreading is an example.
{code}
Specific test is represented by classname#testname
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Patch Available  (was: Open)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158146#comment-13158146 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505285/hbase-4862v7fortrunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/388//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/388//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/388//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v1 for 0.90.diff
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159790#comment-13159790 ] 

Hudson commented on HBASE-4862:
-------------------------------

Integrated in HBase-0.92 #163 (See [https://builds.apache.org/job/HBase-0.92/163/])
    HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen) move JIRA to 0.90 section in CHANGES.txt
HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157652#comment-13157652 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Ted
I add testing to this patch in patchV5.

In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
The test results is as the following:

For trunk with  patchV5:
_
Results :

Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session 

was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)

Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:00:49.122s
[INFO] Finished at: Sun Nov 27 02:41:40 CST 2011
[INFO] Final Memory: 35M/361M
[INFO] ------------------------------------------------------------------------
_



For 0.90 with  patchV5:

_
Results :

Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:15:37.342s
[INFO] Finished at: Sun Nov 27 11:00:07 CST 2011
[INFO] Final Memory: 26M/525M
[INFO] ------------------------------------------------------------------------
_

The failed two tests In trunk are the same as the last run, one of which(testResetZooKeeperSession#TestReplicationPeer) could passed separately,
and the other is related to HBASE-4874
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Summary: Split hlog and open region concurrently happend may cause data loss  (was: Split hlog and open region currently happend may cause data loss)
    
> Split hlog and open region concurrently happend may cause data loss
> -------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862-0.92.txt

Patch for 0.92 branch.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment:     (was: 4862-v6-trunk.txt)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158178#comment-13158178 ] 

Hudson commented on HBASE-4862:
-------------------------------

Integrated in HBase-TRUNK #2490 (See [https://builds.apache.org/job/HBase-TRUNK/2490/])
    HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157412#comment-13157412 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505180/hbase-4862v3fortrunk.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/377//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/377//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/377//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505283/4862-v6-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/387//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/387//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/387//console

This message is automatically generated.)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158347#comment-13158347 ] 

Hudson commented on HBASE-4862:
-------------------------------

Integrated in HBase-TRUNK #2491 (See [https://builds.apache.org/job/HBase-TRUNK/2491/])
    HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen) Move JIRA to 0.90 section

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4862:
------------------------------------------

    Fix Version/s:     (was: 0.92.0)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157410#comment-13157410 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Ted
I has amend the patch again 
Please check
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157367#comment-13157367 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505172/hbase-4862v1+for+trunk.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/374//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/374//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/374//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158835#comment-13158835 ] 

stack commented on HBASE-4862:
------------------------------

This is integrated.  Can we close it?
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156967#comment-13156967 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

After successfully move region from server A to server B,
the log about this region in server A's log file is successful because flushed already,
but it affects other regions'log data in server A's log file if encounter this exception when split hlog
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157407#comment-13157407 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505178/hbase-4862v2fortrunk.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/376//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/376//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/376//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157636#comment-13157636 ] 

Jonathan Hsieh commented on HBASE-4862:
---------------------------------------

How feasible is it to add testing to this patch?  Maybe simulate the failure situation by aborting RS's and then starting them like in the TestSplitTransactionOnCluster tests?
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157379#comment-13157379 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Chunhui ran the patch through test suite.

The OS is:
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
{code}
Results :
Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session
was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)
Tests run: 1173, Failures: 2, Errors: 0, Skipped: 8
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:02:44.930s
{code}
testClosing failure is captured in HBASE-4874.
TestReplicationPeer passed when run manually.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "shenchunhui (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157525#comment-13157525 ] 

shenchunhui commented on HBASE-4862:
------------------------------------

Ted,
I find patch v3 make some failed test after changing fs.rename(wap.p, dst) to if (!fs.rename(wap.p, dst)) {
              throw new IOException("Failed renaming " + wap.p + " to " + dst);
            }
I will amend it , and give you test results later






                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158141#comment-13158141 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505283/4862-v6-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/387//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/387//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/387//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158177#comment-13158177 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Integrated to 0.90, 0.92 branches and TRUNK.

Thanks for the patch Chunhui.

Thanks for the review Jonathan.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157678#comment-13157678 ] 

Ted Yu edited comment on HBASE-4862 at 11/27/11 6:50 AM:
---------------------------------------------------------

@Jonathan
bq. What happens if the .temp gets left behind without being renamed?
If the .temp file gets left behind, it means the log splitting failed, and the .temp file would be deleted in the next log splitting.
You could find that, for the same split hlog file, it creates the same filename in the region's recoverd.edits directory

Thanks for your suggestion.


                
      was (Author: zjushch):
    @Jonathan
What happens if the .temp gets left behind without being renamed?
If the the .temp gets left ,it means the spliting log is failed, and the .temp file would be deleted in the next spliting log.
You could find that, for the same splitted hlog file, it creates the same name file in the region's recoverd.edits directory

Thanks for your suggestion.


                  
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156858#comment-13156858 ] 

Todd Lipcon commented on HBASE-4862:
------------------------------------

wait, wait -- _why_ is this happening concurrently? A region should never be opened until the split process is done for that region. If this is happening we have a much larger issue, which we shouldn't be working around with tmp file names, etc.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157411#comment-13157411 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

+1 on patch v3. 

Please run patch for 0.90 through test suite and let us know the results. 
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Summary: Splitting hlog and opening region concurrently may cause data loss  (was: Split hlog and open region concurrently happend may cause data loss)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862-v6-trunk.patch
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862.txt

I ran a few tests based on patch for TRUNK and didn't see failure.
Reattaching patch for TRUNK.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v2fortrunk.diff
                hbase-4862v2for0.90.diff

@Ted
I has amend the patch
Please check.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862.txt
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Split hlog and open region currently happend may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: 4862.patch

Split hlog :Add suffix ".tmp" for file in the recoverd.edits directory when creating,
and rename it without the suffix after close;

ReplayRecoveredEditsIfAny: skip the file whose name ends with .tmp

                
> Split hlog and open region currently happend may cause data loss
> ----------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Patch Available  (was: Open)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Priority: Critical  (was: Major)

Lifting priority as Ramkrishna suggested.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Open  (was: Patch Available)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Fix Version/s: 0.90.5
                   0.94.0
                   0.92.0
    
> Split hlog and open region concurrently happend may cause data loss
> -------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v1 for trunk.diff
                hbase-4862v1 for 0.90.diff

Grant license to ASF for  the attached patch 
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157250#comment-13157250 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Log snippets from Chunhui.
Region C was 3591e9867a4c125493dc82168854ea0c
File F was 0000000013156791680

Master log:
{code}
2011-11-16 11:47:23,134 INFO org.apache.hadoop.hbase.master.ServerManager:
  Triggering server recovery; existingServer serverB,60020,1321415172631 looks stale
  2011-11-16 11:47:23,134 DEBUG org.apache.hadoop.hbase.master.ServerManager:
  Added=serverB,60020,1321415172631 to dead servers, submitted shutdown handler to be executed, root=false, meta=true

  2011-11-16 11:47:29,305 INFO org.apache.hadoop.hbase.master.ServerManager:
  Triggering server recovery; existingServer serverA,60020,1321415179549 looks stale
  2011-11-16 11:47:29,305 DEBUG org.apache.hadoop.hbase.master.ServerManager:
  Added=serverA,60020,1321415179549 to dead servers, submitted shutdown handler to be executed, root=false, meta=false

  2011-11-16 11:48:28,700 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Splitting 28 hlog(s) in hdfs://serverX:9000/hbase-common/.logs/serverB,60020,1321414043798

  2011-11-16 11:48:30,657 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Creating writer path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103 region=3591e9867a4c125493dc82168854ea0c

  2011-11-16 11:49:17,855 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Closed path hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103 (wrote 75875 edits in 3228ms)

  2011-11-16 11:49:19,629 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Splitting 28 hlog(s) in hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134

  2011-11-16 11:49:20,650 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Creating writer path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 region=3591e9867a4c125493dc82168854ea0c

  2011-11-16 11:49:36,731 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
  Assigning region writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c. to serverD,60020,1321415224381

  2011-11-16 11:49:49,755 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c. on serverD,60020,1321415224381

  2011-11-16 11:50:13,030 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 File does not exist.

  2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log
  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 File does not exist.

  2011-11-16 11:50:13,051 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134
{code}
Log from region server D:
{code}
2011-11-16 11:49:36,730 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.

2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion:
Failed delete of hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
 
2011-11-16 11:49:49,733 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.; next sequenceid=13160672878
{code}

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157671#comment-13157671 ] 

Jonathan Hsieh commented on HBASE-4862:
---------------------------------------

@chenhui

I have a question and a few nits. 

What happens if the .temp gets left behind without being renamed?

You might want to mention that hlogs files in progress (.temp file suffixed) are excluded here.
{code}
+        // After creating writer, simulate partial region's
+        // replayRecoveredEditsIfAny() which gets SplitEditFiles of this
+        // region,and delete them.
{code}

Also, probably want to update javadoc of getSplitEditFilesSorted.

Comment should probably be "most likely" instead of "mostly"
{code}
+    try{
+      logSplitter.splitLog();
+    } catch (IOException e) {
+      LOG.info(e);
+      Assert.fail("Throws IOException when spliting "
+          + "log, it is mostly because writing file does not "
+          + "exist which is caused by concurrent replayRecoveredEditsIfAny()");
+    }
+    if (fs.exists(corruptDir)) {
+      if (fs.listStatus(corruptDir).length > 0) {
+        Assert.fail("There are some corrupt logs, "
+            + "it is mostly caused by concurrent replayRecoveredEditsIfAny()");
+      }
+    }
+  }
{code}

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157270#comment-13157270 ] 

ramkrishna.s.vasudevan commented on HBASE-4862:
-----------------------------------------------

If the scenario is valid do we need to up the priority of this defect? But may not be common.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment:     (was: 4862.txt)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157388#comment-13157388 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

{code}
+    if (fileName.endsWith(HLog.RECOVERED_LOG_TMPFILE_SUFFIX))
+      fileName = fileName.split(HLog.RECOVERED_LOG_TMPFILE_SUFFIX)[0];
{code}
Please enclose the second line above in curly braces.

w.r.t. fs.rename() call, here is javadoc from ClientProtocol.rename(which is called by fs.rename):
{code}
   * @return true if successful, or false if the old name does not exist
   * or if the new name already belongs to the namespace.
{code}
We should check the return value along with catching exception.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Ted @Todd

I'm sorry my explanation is not clear.
I think I should descibe the detailed case first.

In the whole following process , client's putting data to region C.
1.Sucessfully move region C from server A to server B,
At the moment,there is log entry about region C in both server A's log file and server B's log file

2.kill server A and server B,

3.restart server B,
Now, mastet start serverShutdownHanlder for server B, and assign the region C to server D

4,Before region C is opend on the server D,restart server A
Now,mastet start serverShutdownHanlder for server A, and split server A's log file.
Because there is log entry about region C in server A's log file (why? see 1), split hlog thread would create a file F in the region C's recovered.edits directory.

5.In region C opening process, it will execute replayRecoveredEdits(),and then delete file F.

6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause stopping parse the current  server A's hlog file, however, other data in this server A's hlog file lossed

The posted region server log is server B's log, and it is doing replayRecoveredEditsIfAny(). Although it prints failed delete of  file recovered.edits/0000000013156791680, but  in fact this file has been deleted, and master throws file not exist exception :
2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 File does not exist.
 
I'm not sure whether you are clear now, waiting for your question.

Thanks!


                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156866#comment-13156866 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

@Chunhui:
Can you attach master and region server log snippets which would show us what happened ?

Thanks
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v1 for trunk.diff
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

Posted by "Ted Yu (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-4862:
-----------------------------

    Assignee: chunhui shen
    
> Split hlog and open region concurrently happend may cause data loss
> -------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157289#comment-13157289 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505060/hbase-4862v1+for+trunk.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestRollingRestart
                  org.apache.hadoop.hbase.master.TestRestartCluster
                  org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler
                  org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
                  org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
                  org.apache.hadoop.hbase.regionserver.TestAtomicOperation
                  org.apache.hadoop.hbase.TestInfoServers
                  org.apache.hadoop.hbase.regionserver.TestParallelPut
                  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
                  org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary
                  org.apache.hadoop.hbase.TestRegionRebalancing
                  org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
                  org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
                  org.apache.hadoop.hbase.ipc.TestDelayedRpc
                  org.apache.hadoop.hbase.master.TestDistributedLogSplitting
                  org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
                  org.apache.hadoop.hbase.master.TestHMasterRPCException
                  org.apache.hadoop.hbase.regionserver.TestHRegion
                  org.apache.hadoop.hbase.client.TestMultipleTimestamps
                  org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
                  org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
                  org.apache.hadoop.hbase.client.TestMetaScanner
                  org.apache.hadoop.hbase.master.TestMaster
                  org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
                  org.apache.hadoop.hbase.TestDrainingServer
                  org.apache.hadoop.hbase.regionserver.TestSplitLogWorker
                  org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
                  org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
                  org.apache.hadoop.hbase.avro.TestAvroServer
                  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
                  org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
                  org.apache.hadoop.hbase.thrift.TestThriftServer
                  org.apache.hadoop.hbase.regionserver.TestRegionServerMetrics
                  org.apache.hadoop.hbase.master.TestMasterFailover
                  org.apache.hadoop.hbase.regionserver.wal.TestHLog
                  org.apache.hadoop.hbase.TestMultiVersions
                  org.apache.hadoop.hbase.master.TestMasterTransitions
                  org.apache.hadoop.hbase.master.TestSplitLogManager
                  org.apache.hadoop.hbase.master.TestOpenedRegionHandler
                  org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/369//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/369//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/369//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157918#comment-13157918 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505251/4862-v6-trunk.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/379//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/379//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/379//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4862:
------------------------------------------

    Fix Version/s: 0.92.0
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157652#comment-13157652 ] 

Ted Yu edited comment on HBASE-4862 at 11/27/11 5:50 AM:
---------------------------------------------------------

@Ted
I add testing to this patch in patchV5.

In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
The test results is as the following:

For trunk with  patchV5:
_
Results :

Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session 

was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)

Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:00:49.122s
[INFO] Finished at: Sun Nov 27 02:41:40 CST 2011
[INFO] Final Memory: 35M/361M
[INFO] ------------------------------------------------------------------------
_



For 0.90 with  patchV5:

_
Results :

Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:15:37.342s
[INFO] Finished at: Sun Nov 27 11:00:07 CST 2011
[INFO] Final Memory: 26M/525M
[INFO] ------------------------------------------------------------------------
_

The failed two tests In trunk are the same as the last run, one of which(TestReplicationPeer#testResetZooKeeperSession) could pass separately and the other is related to HBASE-4874
                
      was (Author: zjushch):
    @Ted
I add testing to this patch in patchV5.

In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
The test results is as the following:

For trunk with  patchV5:
_
Results :

Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session 

was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)

Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:00:49.122s
[INFO] Finished at: Sun Nov 27 02:41:40 CST 2011
[INFO] Final Memory: 35M/361M
[INFO] ------------------------------------------------------------------------
_



For 0.90 with  patchV5:

_
Results :

Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:15:37.342s
[INFO] Finished at: Sun Nov 27 11:00:07 CST 2011
[INFO] Final Memory: 26M/525M
[INFO] ------------------------------------------------------------------------
_

The failed two tests In trunk are the same as the last run, one of which(testResetZooKeeperSession#TestReplicationPeer) could passed separately,
and the other is related to HBASE-4874
                  
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Patch Available  (was: Open)

TestHLogSplit passed on MacBook.

Rerun test suite on Jenkins.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158266#comment-13158266 ] 

Hudson commented on HBASE-4862:
-------------------------------

Integrated in HBase-0.92-security #20 (See [https://builds.apache.org/job/HBase-0.92-security/20/])
    HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen) move JIRA to 0.90 section in CHANGES.txt
HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157678#comment-13157678 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Jonathan
What happens if the .temp gets left behind without being renamed?
If the the .temp gets left ,it means the spliting log is failed, and the .temp file would be deleted in the next spliting log.
You could find that, for the same splitted hlog file, it creates the same name file in the region's recoverd.edits directory

Thanks for your suggestion.


                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Open  (was: Patch Available)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Patch Available  (was: Open)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158151#comment-13158151 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505287/4862-0.92.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/389//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158321#comment-13158321 ] 

Hudson commented on HBASE-4862:
-------------------------------

Integrated in HBase-TRUNK-security #12 (See [https://builds.apache.org/job/HBase-TRUNK-security/12/])
    HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen) Move JIRA to 0.90 section
HBASE-4862  Splitting hlog and opening region concurrently may cause data loss
               (Chunhui Shen)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Status: Open  (was: Patch Available)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157919#comment-13157919 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505252/4862-v6-90.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/380//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157341#comment-13157341 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

When attaching patch, please grant license to ASF.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4862:
--------------------------------

    Attachment: hbase-4862v5fortrunk.diff
                hbase-4862v5for0.90.diff

Add a test case in patchv5
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505162/4862.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/371//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/371//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/371//console

This message is automatically generated.)
    
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157406#comment-13157406 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Thanks for the quick turnaround.
{code}
+            throw new IOException("Failed rename " + wap.p + " to " + dst);
{code}
The above should read 'Failed renaming '.

For HLog.java:
{code}
+          if (p.getName().endsWith(RECOVERED_LOG_TMPFILE_SUFFIX))
+            result = false;
{code}
Please add curly braces for the above as well.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157327#comment-13157327 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505162/4862.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/371//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/371//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/371//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157334#comment-13157334 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505167/4862.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/373//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/373//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/373//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157638#comment-13157638 ] 

chunhui shen commented on HBASE-4862:
-------------------------------------

@Jonathan
I think we could add testing to this patch through doing region's replayrecoverdedit after creating writer when doing splitlog.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157662#comment-13157662 ] 

Hadoop QA commented on HBASE-4862:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505225/hbase-4862v5fortrunk.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestRollingRestart
                  org.apache.hadoop.hbase.util.TestRegionSplitter
                  org.apache.hadoop.hbase.client.TestMultiParallel
                  org.apache.hadoop.hbase.master.TestRestartCluster
                  org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler
                  org.apache.hadoop.hbase.client.TestInstantSchemaChange
                  org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
                  org.apache.hadoop.hbase.rest.TestGzipFilter
                  org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
                  org.apache.hadoop.hbase.regionserver.TestAtomicOperation
                  org.apache.hadoop.hbase.rest.TestScannersWithFilters
                  org.apache.hadoop.hbase.TestInfoServers
                  org.apache.hadoop.hbase.regionserver.TestParallelPut
                  org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.client.TestAdmin
                  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
                  org.apache.hadoop.hbase.filter.TestColumnRangeFilter
                  org.apache.hadoop.hbase.mapred.TestTableInputFormat
                  org.apache.hadoop.hbase.client.TestHCM
                  org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary
                  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole
                  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
                  org.apache.hadoop.hbase.rest.TestStatusResource
                  org.apache.hadoop.hbase.TestRegionRebalancing
                  org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
                  org.apache.hadoop.hbase.rest.TestVersionResource
                  org.apache.hadoop.hbase.client.TestScannerTimeout
                  org.apache.hadoop.hbase.client.TestFromClientSide
                  org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
                  org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol
                  org.apache.hadoop.hbase.rest.TestRowResource
                  org.apache.hadoop.hbase.rest.TestScannerResource
                  org.apache.hadoop.hbase.ipc.TestDelayedRpc
                  org.apache.hadoop.hbase.rest.client.TestRemoteAdmin
                  org.apache.hadoop.hbase.util.TestFSUtils
                  org.apache.hadoop.hbase.master.TestDistributedLogSplitting
                  org.apache.hadoop.hbase.rest.TestTableResource
                  org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
                  org.apache.hadoop.hbase.util.TestIdLock
                  org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster
                  org.apache.hadoop.hbase.rest.TestTransform
                  org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
                  org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
                  org.apache.hadoop.hbase.regionserver.TestHRegion
                  org.apache.hadoop.hbase.client.TestMultipleTimestamps
                  org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
                  org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
                  org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
                  org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
                  org.apache.hadoop.hbase.client.TestMetaScanner
                  org.apache.hadoop.hbase.io.hfile.TestHFileBlock
                  org.apache.hadoop.hbase.client.TestTimestampsFilter
                  org.apache.hadoop.hbase.client.TestInstantSchemaChangeFailover
                  org.apache.hadoop.hbase.client.TestShell
                  org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
                  org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass
                  org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
                  org.apache.hadoop.hbase.rest.TestSchemaResource
                  org.apache.hadoop.hbase.TestAcidGuarantees
                  org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
                  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase
                  org.apache.hadoop.hbase.avro.TestAvroServer
                  org.apache.hadoop.hbase.rest.client.TestRemoteTable
                  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
                  org.apache.hadoop.hbase.util.TestHBaseFsck
                  org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove
                  org.apache.hadoop.hbase.client.TestHTableUtil
                  org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
                  org.apache.hadoop.hbase.thrift.TestThriftServer
                  org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
                  org.apache.hadoop.hbase.util.TestMergeTool
                  org.apache.hadoop.hbase.regionserver.TestRegionServerMetrics
                  org.apache.hadoop.hbase.util.TestMergeTable
                  org.apache.hadoop.hbase.master.TestMasterFailover
                  org.apache.hadoop.hbase.regionserver.wal.TestHLog
                  org.apache.hadoop.hbase.rest.TestMultiRowResource
                  org.apache.hadoop.hbase.TestMultiVersions
                  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.master.TestMasterTransitions
                  org.apache.hadoop.hbase.master.TestSplitLogManager
                  org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove
                  org.apache.hadoop.hbase.coprocessor.TestWALObserver
                  org.apache.hadoop.hbase.TestZooKeeper
                  org.apache.hadoop.hbase.master.TestOpenedRegionHandler
                  org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
                  org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithAbort

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/378//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/378//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/378//console

This message is automatically generated.
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4862:
--------------------------

    Attachment: 4862-v6-trunk.txt

Patch v6 with javadoc updated according to reviews
                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862-v6-trunk.txt, 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira