You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2013/08/02 22:03:49 UTC
[jira] [Updated] (HBASE-8646) Intermittent TestIOFencing#testFencingAroundCompaction failure due to region getting stuck in compaction

     [ https://issues.apache.org/jira/browse/HBASE-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-8646:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.98.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Resolving this one applied to trunk and 0.95.  HBASE-9023 is still open for the compaction failure with WAL sync that is still happening pretty frequently.
                
> Intermittent TestIOFencing#testFencingAroundCompaction failure due to region getting stuck in compaction
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8646
>                 URL: https://issues.apache.org/jira/browse/HBASE-8646
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.95.2
>
>         Attachments: hbase-8646_v1.patch
>
>
> From http://54.241.6.143/job/HBase-TRUNK/org.apache.hbase$hbase-server/348/testReport/junit/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ (the underlying region is tabletest,,1369855507443.c251a1d71e75fed8e490db63419edcf1.):
> {code}
> 2013-05-29 19:25:20,363 DEBUG [pool-1-thread-1] catalog.CatalogTracker(208): Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@6280d069
> 2013-05-29 19:25:20,366 INFO  [pool-1-thread-1] hbase.TestIOFencing(255): Waiting for compaction to be about to start
> 2013-05-29 19:25:20,367 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(107): waiting for compaction to block
> 2013-05-29 19:25:20,367 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(109): compaction block reached
> 2013-05-29 19:25:20,367 INFO  [pool-1-thread-1] hbase.TestIOFencing(257): Starting a new server
> 2013-05-29 19:25:20,424 DEBUG [pool-1-thread-1] client.HConnectionManager(2811): regionserver/ip-10-197-74-184.us-west-1.compute.internal/10.197.74.184:0 HConnection server-to-server retries=100
> ...
> 2013-05-29 19:25:20,861 INFO  [pool-1-thread-1] hbase.TestIOFencing(260): Killing region server ZK lease
> ...
> 2013-05-29 19:25:21,030 DEBUG [RS_CLOSE_REGION-ip-10-197-74-184.us-west-1.compute.internal,37836,1369855503920-0] handler.CloseRegionHandler(125): Processing close of tabletest,,1369855507443.c251a1d71e75fed8e490db63419edcf1.
> 2013-05-29 19:25:21,031 DEBUG [RS_CLOSE_REGION-ip-10-197-74-184.us-west-1.compute.internal,37836,1369855503920-0] regionserver.HRegion(928): Closing tabletest,,1369855507443.c251a1d71e75fed8e490db63419edcf1.: disabling compactions & flushes
> 2013-05-29 19:25:21,031 DEBUG [RS_CLOSE_REGION-ip-10-197-74-184.us-west-1.compute.internal,37836,1369855503920-0] regionserver.HRegion(1022): waiting for 1 compactions to complete for region tabletest,,1369855507443.c251a1d71e75fed8e490db63419edcf1.
> ...
> 2013-05-29 19:25:27,037 INFO  [pool-1-thread-1] hbase.TestIOFencing(265): Waiting for the new server to pick up the region tabletest,,1369855507443.c251a1d71e75fed8e490db63419edcf1.
> {code}
> The test started new region server. However, the region got stuck in:
> {code}
>   public void waitForFlushesAndCompactions() {
>     synchronized (writestate) {
>       while (writestate.compacting > 0 || writestate.flushing) {
>         LOG.debug("waiting for " + writestate.compacting + " compactions"
>             + (writestate.flushing ? " & cache flush" : "") + " to complete for region " + this);
>         try {
>           writestate.wait();
> {code}
> This led to the timeout:
> {code}
>         assertTrue("Timed out waiting for new server to open region",
>           System.currentTimeMillis() - startWaitTime < 60000);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira