You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/01/04 21:11:53 UTC

[jira] Created: (HBASE-3412) HLogSplitter should handle missing HLogs

HLogSplitter should handle missing HLogs
----------------------------------------

                 Key: HBASE-3412
                 URL: https://issues.apache.org/jira/browse/HBASE-3412
             Project: HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
            Priority: Critical
             Fix For: 0.90.0


In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:

{noformat}
[MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
 Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
...
[RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
 moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
 whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
...
[MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
 Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
 java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
 No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
 File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
{noformat}

We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977978#action_12977978 ] 

Hudson commented on HBASE-3412:
-------------------------------

Integrated in HBase-TRUNK #1703 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/])
    HBASE-3412  HLogSplitter should handle missing HLogs


> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412-2.patch, HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3412:
--------------------------------------

    Attachment: HBASE-3412-2.patch

Patch that adds FS verifications and only throws a FNFE if the message contains the right string.

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412-2.patch, HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977570#action_12977570 ] 

Jean-Daniel Cryans commented on HBASE-3412:
-------------------------------------------

bq. The test doesn't assert anything. How do I know it successful? Should you check the FS for anything?

Since I mock the FS to get the LEE, nothing changes on the filesystem... so you think I should check if everything is still where it's supposed to be i.e. all the logs in the logs folder?

bq. This seems dangerous. Is it?

Less dangerous than not handling it IMO, since currently it cancels the whole log replay process. If your concern is that we might miss other kinds of exceptions hidden in LEE, then I think we could grep the exception message for "File does not exist" and otherwise let the exception come out like it currently does... although it really bothers me to do that since it cancels log splitting and guarantees data loss even if other logs after the one that throws the exception were fine.

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3412:
--------------------------------------

    Attachment: HBASE-3412.patch

Patch that converts the LEE into FNFE and handles it correctly. Also includes a little test.

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977568#action_12977568 ] 

stack commented on HBASE-3412:
------------------------------

The test doesn't assert anything.  How do I know it successful?  Should you check the FS for anything?

This seems dangerous.  Is it?

{code}
+        } else if (e instanceof LeaseExpiredException) {
+          // This exception comes out instead of FNFE, fix it
+          throw new FileNotFoundException(
+              "The given HLog wasn't found at " + p.toString());
{code}

Otherwise, looks good.

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977610#action_12977610 ] 

stack commented on HBASE-3412:
------------------------------

+1

Thanks for making improvements.

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412-2.patch, HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-3412) HLogSplitter should handle missing HLogs

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-3412.
---------------------------------------

      Resolution: Fixed
        Assignee: Jean-Daniel Cryans
    Hadoop Flags: [Reviewed]

Committed to branch and trunk, thanks for the review Stack!

> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
>                 Key: HBASE-3412
>                 URL: https://issues.apache.org/jira/browse/HBASE-3412
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3412-2.patch, HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
>  Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
>  moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204):
>  Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
>  java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>  No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
>  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.