You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2010/11/29 22:30:11 UTC

[jira] Created: (HBASE-3285) Hlog recovery takes too much time

Hlog recovery takes too much time
---------------------------------

                 Key: HBASE-3285
                 URL: https://issues.apache.org/jira/browse/HBASE-3285
             Project: HBase
          Issue Type: Bug
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang


Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

    Attachment: 3285-v3.txt

Fix a noisey log message in case where recoverlease method is missing.

Tested on cluster w/o the api and seems to work fine.

Will apply this patch to branch and trunk unless objection.  It can make use of the fs.recoverLease API if present purportedly skirting issues seen where we can get into a loop stuck trying to grab lease when trying to open a file for append.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, 3285-v3.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

    Attachment: 0004-Added-in-the-hdfs-1520-1555-1554-from-tip-of-the-bra.patch

Updated patch for cdh3b2.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.90.2
>
>         Attachments: 0004-Added-in-the-hdfs-1520-1555-1554-from-tip-of-the-bra.patch, 3285-v2.txt, 3285-v3.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998549#comment-12998549 ] 

stack commented on HBASE-3285:
------------------------------

@Hairong Are there new exceptions when hdfs-1554 is in place?  This patch seems to just catch and wait and change whats logged, not call some new API (oh, it also adds a new convertion to FNFE).  Thanks.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

    Attachment: 3285-v2.txt

Made patch work when recoverLease is not available by stealing from Nicolas's HBASE-2312 patch; he uses reflection to test for recoverLease falling back on append if its missing (as opposed to this patch just using recoverLease).  Running tests.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998551#comment-12998551 ] 

Hairong Kuang commented on HBASE-3285:
--------------------------------------

RecoverLease does not introduce any new exception. The patch changed to use recoverLease not append to recover lease. HDFS-1554 did not introduce new API, but changed recoverLease semantics.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979000#action_12979000 ] 

Hairong Kuang commented on HBASE-3285:
--------------------------------------

HDFS-1554 provides a new semantics which will speed up lease recovery. (No need to wait for soft lease expiration).

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002921#comment-13002921 ] 

stack commented on HBASE-3285:
------------------------------

All tests on 0.90 pass w/ this patch in place.  TestReplication failed but when I ran it alone it passed.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965038#action_12965038 ] 

Jean-Daniel Cryans commented on HBASE-3285:
-------------------------------------------

Would this new NN operation still goes through the lease recovery? In my experience, waiting on the leases to expire has been our biggest source of downtime.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.90.2
     Release Note: Adds hbase exploitation of  new lease recovery added by hdfs-1520.  New API is available at tip of branch-0.20-append, in advance of the version of hadoop that ships with 0.90.1 hbase (r1057313).  Must patch CDH to add the API.
           Status: Resolved  (was: Patch Available)

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.90.2
>
>         Attachments: 3285-v2.txt, 3285-v3.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HBASE-3285:
---------------------------------

    Attachment: hbaseRecoverHLog.patch

Here is the patch that uses the new recoverLease API to recover HLog.

I assume that CDH will pull in the HADOOP recoverLease change into their distribution.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003510#comment-13003510 ] 

stack commented on HBASE-3285:
------------------------------

Unit tests are broke on branch-0.20-append.  In particular TestFileAppend4 fails in testRecoverFinalizedBlock.  Its broke before these 1520,1555,1554 patches went in.  TestMultiThreadedSync is also failing.

Before applying patches to CDH3b2, I had this failing TestSyncingWriterInterrupted.  Seems sporadic though because after applying the 1520+ series of patches, only TestAppend4 failed.

Going to apply this patch to hbase.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, 3285-v3.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002939#comment-13002939 ] 

stack commented on HBASE-3285:
------------------------------

Did a bit of server killing on a cluster that was up on the tip of branch-0.20-append -- i.e. had hdfs-1520, 1555, and 1554 -- and recovery of lease seems to run nice and promptly.  Testing now on the branch-0.20-append that hbase 0.90.1 ships with, a branch-0.20-append that was absent 1520, 1555, and 1554.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

    Attachment: hdfs-1520,1555,1554-for-cdh3b2.txt

Patch for cdh3b2 adding recent additions from branch-0.20-append (hdfs-1520,1555,1554) used testing recover hlog works up on cdh too.  Running tests, I see a bunch fail -- TestPread, TestFileCreation, TestDatanodeBlockScanner.  Will try w/o this patch to see if I get same failure set.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: 3285-v2.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965579#action_12965579 ] 

Hairong Kuang commented on HBASE-3285:
--------------------------------------

This issue does not speedup the lease recovery process but will reduce the cost of triggering lease recovery. In our environment, we have observed over 10 minutes for append to finish due to a problematic datanode in the pipeline.

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3285) Hlog recovery takes too much time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3285:
-------------------------

    Status: Patch Available  (was: Open)

> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: hbaseRecoverHLog.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3285) Hlog recovery takes too much time

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003606#comment-13003606 ] 

Hudson commented on HBASE-3285:
-------------------------------

Integrated in HBase-TRUNK #1771 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1771/])
    HBASE-3285 Hlog recovery takes too much time


> Hlog recovery takes too much time
> ---------------------------------
>
>                 Key: HBASE-3285
>                 URL: https://issues.apache.org/jira/browse/HBASE-3285
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.90.2
>
>         Attachments: 0004-Added-in-the-hdfs-1520-1555-1554-from-tip-of-the-bra.patch, 3285-v2.txt, 3285-v3.txt, hbaseRecoverHLog.patch, hdfs-1520,1555,1554-for-cdh3b2.txt
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira