You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Elliott Clark (JIRA)" <ji...@apache.org> on 2012/06/05 23:15:23 UTC

[jira] [Created] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Elliott Clark created HBASE-6165:
------------------------------------

             Summary: Replication can overrun .META scans on cluster re-start
                 Key: HBASE-6165
                 URL: https://issues.apache.org/jira/browse/HBASE-6165
             Project: HBase
          Issue Type: Bug
            Reporter: Elliott Clark


When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433703#comment-13433703 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

Yes, please.
Aborted test run is different from compilation error.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442790#comment-13442790 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

The above patch was for trunk;
will upload a 0.94 one.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452491#comment-13452491 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

[~whitingj], originally replication was using the normal handlers and was just deadlocking the clusters in a different way. ReplicationSink uses the HBase client which can block for ungodly amounts of time so it would fill up the handlers and the RS would stop serving requests. HBASE-6550 changed the latter that a bit by setting low timeouts via replication-specific client-side configuration parameters (if it was using the normal client configurations it would also affect all the other clients). With HBASE-6165 it's even safer since replication is sandboxed.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432050#comment-13432050 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

+1 on shifting away from using HTablePool in the JIRA for fail-fast.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-v5.patch

patch refreshed
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-v1.patch

I hit this problem while testing a long running replication setup. All priority handlers were blocked by replicationLog method, and cluster became unresponsive.

Attached is a patch which does the following:
a) Add a differnt QOS level, customQOS. Methods with this attribute will be processed by a new set of handlers.
b) Adds customPriorityHandlers, a new set of handlers in Regionserver.

ReplicationSink#replicateLogEntries uses this attribute. 

Testing: Jenkins is green. Have a long running replication setup, and its up for a few days.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-94-v2.patch

replicateLogEntries with replication QOS
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445311#comment-13445311 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

You beat me :)
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432053#comment-13432053 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

Lars, Ted and Elliot: Thanks for the feedback.


@Lars: Changing the name is beyond the scope of this jira, no? Another jira for that?
re: failfast: Yeah, the patch still uses HTablePool, but submits the batch in a threadpool (of ReplicationSink). Meanwhile, the handler keeps checking whether the client is still alive or not, while waiting for the task to finish. If the client is out, it cancels the task.
Also, ReplicationSink now has its own conf object where it can decorate it with its own timeout, number of retrials etc. Is there an open jira for ReplicationSink (can't create a jira yet)?
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jeff Whiting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452552#comment-13452552 ] 

Jeff Whiting commented on HBASE-6165:
-------------------------------------

@stack and @jdcryans  Thanks for the explanation.  I can see how it would deadlock on itself. I also found HBASE-3401 which talks about the deadlock.  We patched our cdh4 cluster with HBASE-6724 and it has been running much smoother.  
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432023#comment-13432023 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

@Lars
We had this happen when a large cluster is replication to a small cluster.
Source (Large Cluster)
Sink (Small cluster)

After the sink goes down or re-starts, the source waits for meta to come up.  After that lots of replicate wal edits are shipped to all the server.  So many in fact that the server holding meta does not have any left to answer meta scans or edits.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444292#comment-13444292 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542957/HBase-6165-94-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2724//console

This message is automatically generated.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442786#comment-13442786 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

FWIW the v4 patch really doesn't apply on 0.94:

{noformat}

su-jdcryans-2:hbase-git-su jdcryans$ patch -p1 -F 10 --dry-run < HBase-6165-v4.patch 
patching file src/main/java/org/apache/hadoop/hbase/HConstants.java
Hunk #1 succeeded at 650 with fuzz 2 (offset -42 lines).
patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
Hunk #1 succeeded at 98 (offset -11 lines).
patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
Hunk #1 succeeded at 225 (offset -51 lines).
Hunk #2 succeeded at 1304 (offset -360 lines).
Hunk #3 succeeded at 1335 with fuzz 1 (offset -414 lines).
Hunk #4 succeeded at 1356 (offset -415 lines).
Hunk #5 succeeded at 1526 (offset -405 lines).
Hunk #6 succeeded at 1630 with fuzz 3 (offset -415 lines).
Hunk #7 succeeded at 1652 (offset -423 lines).
Hunk #8 succeeded at 1664 (offset -423 lines).
patching file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Hunk #1 succeeded at 449 with fuzz 2 (offset 153 lines).
Hunk #2 FAILED at 658.
Hunk #3 succeeded at 486 (offset -87 lines).
Hunk #4 succeeded at 504 (offset -87 lines).
Hunk #5 succeeded at 520 (offset -87 lines).
Hunk #6 succeeded at 536 (offset -87 lines).
Hunk #7 succeeded at 3159 (offset 1061 lines).
Hunk #8 succeeded at 3170 with fuzz 1 (offset 1059 lines).
Hunk #9 succeeded at 3630 with fuzz 3 (offset 529 lines).
Hunk #10 FAILED at 3836.
Hunk #11 FAILED at 3883.
Hunk #12 FAILED at 3911.
Hunk #13 FAILED at 3998.
Hunk #14 FAILED at 4037.
Hunk #15 FAILED at 4068.
Hunk #16 FAILED at 4097.
Hunk #17 FAILED at 4131.
9 out of 17 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
{noformat}
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433756#comment-13433756 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

@Himanshu:
I don't know the root cause for abortion of QA run.

w.r.t. queue naming, can I assume that misc(ellaneous) is acceptable to everyone ?
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448255#comment-13448255 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-0.94-security #51 (See [https://builds.apache.org/job/HBase-0.94-security/51/])
    HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433700#comment-13433700 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

On current trunk (with commit 7b9cbf0c0b35468591b3a1cf5c93951461590f8c), it applied clean. Shall I upload again?

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6165:
---------------------------------

    Fix Version/s: 0.94.2
                   0.96.0

This should be in 0.94
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393636#comment-13393636 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

Upping the number of privileged ipc threads is the workaround that we're going to deploy soon.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445464#comment-13445464 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

good to know; will set up a svn/eclipse environment.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442868#comment-13442868 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542698/HBase-6165-94-v1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2710//console

This message is automatically generated.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445320#comment-13445320 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

No, I can't. You have the superpower to take it one step forward :)
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445498#comment-13445498 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

Thanks for the final patch Lars :)
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449409#comment-13449409 ] 

stack commented on HBASE-6165:
------------------------------

@Jeff IIRC they need to be on a channel other than user priority queue because they can overwhelm user loadings (e.g. big cluster replication into small cluster).  We've been learning a bunch of late about replicating and its fair to say that some pieces need a bit of rethink making them more robust around cases such as aforementioned large into small or one we ran into ourselves recently where we couldn't start the small cluster because the high priority handlers were all occupied by replication soon after startup (This patch would help w/ that scenario).  I see that this patch has just been backported to 0.92 -- hopefully that will be of help to you in your current predicament.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433823#comment-13433823 ] 

Andrew Purtell commented on HBASE-6165:
---------------------------------------

MISC doesn't have any meaning. 

Neither does "custom".

IMO, name these after what they actually do. If this is for replication, name it REPLICATION_QOS.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433725#comment-13433725 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

It's not that custom doesn't convey enough meaning (I could live with that).  Custom implies that there's been some modification from normal or stock.  That is not the case.  These handlers are there for things that are built in.  Replication and security are core pieces of functionality.  Naming things custom gives the impression that they are not as supported as other operation, which is not the case.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445212#comment-13445212 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

I've been running the 0.94 patch since yesterday, +1.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433718#comment-13433718 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

@Elliot: I don't want to tie them with replication. As you see, they have +ve default value now, so it will not be correct to call them REPLICATION_OPS.
Any method with CUSTOM_OPS attributed will be handled with it. The nearest candidate to use this is Security related methods I think.
MISC/INTERNAL doesn't convey anything specific too
Don't know, but CUSTOM still looks ok to me... :) But will be glad to change with more appropriate name. 

@Ted: What does that error mean btw?
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-v3.patch

Thanks Andrew.

Revised patch with the following changes:
a) Call Queue name, QOS are replication specific :)
b) default number of replication handlers is 3
c) Moved QOS attributes constants to HConstants.

Tested replication on a cluster; green jenkins


                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396052#comment-13396052 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

The other solution is to have a different set of handlers, but this requires to either hack HBaseServer to add another queue and priority level or refactor it to make it more configurable. 
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296141#comment-13296141 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

What's a good approach to avoid this?
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445348#comment-13445348 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

It is based on top of commit: e554fa9b0cc06c7a364c38bed53139da5e354b36; I took an update before creating it. Are there more commits after this which git doesn't has? 

Feel free to create the new patch then.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445396#comment-13445396 ] 

stack commented on HBASE-6165:
------------------------------

git can lag svn.  Would suggest you get an svn checkout and make sure patch applies there.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444232#comment-13444232 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

The 0.94 patch doesn't set the proper QOS:

{code}
-  @QosPriority(priority=HIGH_QOS)
+  @QosPriority(priority=HConstants.HIGH_QOS)
   public void replicateLogEntries(final HLog.Entry[] entries)
{code}
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433327#comment-13433327 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

Sounds good.
Consider renaming "hbase.regionserver.custom.priority.handler.count" to "hbase.regionserver.custom.handler.count"
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469986#comment-13469986 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-0.92-security #143 (See [https://builds.apache.org/job/HBase-0.92-security/143/])
    HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.2, 0.96.0
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432045#comment-13432045 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

@Himanshu: Thanks. Yes makes sense. I like MetaHandlers.
Re: failing fast: I think instead of using an HTablePool the sink should create a Connection and ThreadPool and then create HTable on demand using these (see: HBASE-4805), together with short timeouts and few retries.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445314#comment-13445314 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543165/HBase-6165-v5.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2742//console

This message is automatically generated.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432022#comment-13432022 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

w.r.t. default value for hbase.regionserver.custom.priority.handler.count, I agree with Lars and Elliot that the default should be > 0.
Actually we should perform check on the actual value: if user specifies 0 and either replication or security is enabled, we should raise the value to, say, 3.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445263#comment-13445263 ] 

Jean-Daniel Cryans commented on HBASE-6165:
-------------------------------------------

The trunk patch needs a refresh.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6165:
---------------------------------

    Attachment: 6165-v6.txt

Trunk patch that I am going to commit. Also fixed TestPriorityRpc, which didn't compile.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448324#comment-13448324 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/])
    HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-v4.patch

You are right Sir! Done.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445447#comment-13445447 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

The canonical repository is the SVN repository, Himanshu.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449382#comment-13449382 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-0.92 #558 (See [https://builds.apache.org/job/HBase-0.92/558/])
    HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451)

     Result = FAILURE
Tedyu : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Status: Open  (was: Patch Available)
    
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437478#comment-13437478 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541517/HBase-6165-v4.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//console

This message is automatically generated.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6165:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96.
Thanks for the patch Himanshu.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432029#comment-13432029 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

[~eclark]: I used custom, because the current naming scheme is not appropriate in my opinion (I started with medium/semi QOS, but then changed it to Custom). Using priority is kind of a misnomer as there is no priority as such, its just different set of handlers that is serving the requests.
Though we call them priorityHandlers, etc, they are just like regular handlers but for meta operations. I think we should change their name to metaOpsHandlers (or metaHandlers). Yea, I just used a threshold b/w 0 and 10.

bq. Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled.
I am ok with >0 default; don't think it should be tied to replication as they can be used for other methods too (such as Security, etc)

@Lars: 
bq. The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right?
Hope you find it rationale now.

bq. By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS.
default > 0 sounds good?


bq. What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)?

It can occur when the slave cluster is slow. And whenever it happens, it will make the entire cluster unresponsive. I have a patch which adds the fail fast behavior in sink and has been testing it too. It looks good so far. I tried creating a new JIRA but IOE while creating it (see INFRA-5131). Will attach the patch once its created.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433690#comment-13433690 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

>From https://builds.apache.org/job/PreCommit-HBASE-Build/2556/console:
{code}
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/dev-support/test-patch.sh: line 353:   393 Aborted
{code}
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445346#comment-13445346 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

:)

Unfortunately v5 still does not apply cleanly to trunk.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Status: Patch Available  (was: Open)
    
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6165:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437450#comment-13437450 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541512/HBase-6165-v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestFromClientSide

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//console

This message is automatically generated.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433707#comment-13433707 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

I still don't understand the naming.  There's nothing "custom" about these handlers.  They handle replication. REPLICATION_OPS, MISC_OPS, INTERNAL_OPS any of those seem convey more about the type of operations these threads will handle.


                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432224#comment-13432224 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

Yeah, and I think it should be changed to what it actually do. So, changing the QOS and respective handlers in the line of CLIENT_OPS, CUSTOM_OPS, and META_OPS seems more appropriate. 
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6165:
----------------------------------

        Assignee: Himanshu Vashishtha
    Hadoop Flags: Reviewed
         Summary: Replication can overrun .META. scans on cluster re-start  (was: Replication can overrun .META scans on cluster re-start)
    
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432205#comment-13432205 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

{quote}Using priority is kind of a misnomer as there is no priority as such{quote}

The actual handlers don't imply some sort of QOS, but the naming does correspond to {low|medium|high} priority set of operations that can be in that handler's queue.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-94-v1.patch

0.94 patch
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445299#comment-13445299 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

I'll make an updated patch.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445465#comment-13445465 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

I'll make a patch for now. For folks who like git, svn is a pain (or so I heard) :)
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Jeff Whiting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449245#comment-13449245 ] 

Jeff Whiting commented on HBASE-6165:
-------------------------------------

I maybe a little late to the party, but why is replication using any kind of higher than normal priority handlers? 

It looks like we all agree that they shouldn't be using the high priority handlers.  It looks like they now have their own medium priority handlers. But I don't see an argument as to why they don't just use the normal handlers priority handlers.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432056#comment-13432056 ] 

Hadoop QA commented on HBASE-6165:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540074/HBase-6165-v1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.master.TestAssignmentManager
                  org.apache.hadoop.hbase.TestLocalHBaseCluster

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//console

This message is automatically generated.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432014#comment-13432014 ] 

Elliott Clark commented on HBASE-6165:
--------------------------------------

A better name is probably needed for the Queue.  Custom doesn't really get across what's can go into that qos level (replication).
Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication.  We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled.
Why choose the number 5 for the priority ?  Since the QOS_THRESHOLD is 10. (Even if they are arbitrary seems like we should have some reason and a comment about the numbering scheme.)


Thanks for doing this.
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433313#comment-13433313 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

So, shall I upload with a +ve default value for the number of custom handlers then? For the naming of existing handlers, I can another jira? Thoughts?
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445614#comment-13445614 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/])
    HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379235)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6165:
---------------------------------------

    Attachment: HBase-6165-v2.patch

Making the default custom handlers as 5 instead of 0; Renamed property as per Ted's suggestion.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452557#comment-13452557 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

[~whitingj] Specifically, replication specific jira about deadlocking on normal handlers is HBASE-4280.
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432015#comment-13432015 ] 

Lars Hofhansl commented on HBASE-6165:
--------------------------------------

Patch looks good generally. Few comments:
# The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right?
# Is there a way to generalize this to sets of Handlers with different priority (not important, though).
# By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS.

What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)?
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445641#comment-13445641 ] 

Hudson commented on HBASE-6165:
-------------------------------

Integrated in HBase-0.94 #443 (See [https://builds.apache.org/job/HBase-0.94/443/])
    HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

     Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437452#comment-13437452 ] 

Zhihong Ted Yu commented on HBASE-6165:
---------------------------------------

Patch v3 looks clean.
nit:
{code}
+    if(handlers != null) {
+      for(Handler h : handlers) {
{code}
Space should be added immediately before '('
                
> Replication can overrun .META. scans on cluster re-start
> --------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432068#comment-13432068 ] 

Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------

Created fail-fast replicationSink jira HBase-6550 (https://issues.apache.org/jira/browse/HBASE-6550)
                
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
>                 Key: HBASE-6165
>                 URL: https://issues.apache.org/jira/browse/HBASE-6165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira