You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/03/18 18:42:29 UTC

[jira] Created: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
----------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3671
                 URL: https://issues.apache.org/jira/browse/HBASE-3671
             Project: HBase
          Issue Type: Bug
            Reporter: stack


This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).

Here is the sequence of events:

1. We start to process OPENED region event.
2. We receive a SPLIT of this region report.
3. SPLIT processing offline the region and onlines daughters.
4. Metascanner runs and clears out the region from .META. deleting it
5. The OPENED handler runs.  Marks the region online in Master memory.
6. Balancer runs.  Trys to balance a region that has been deleted.

Loops for ever.

Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:

{code}
# We assign the region
2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
...
2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
...
2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
...
2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
...
2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78

# At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED

# Now metascanner runs and actually removes the parent region, deleting it all

2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references

2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78

2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META

# Now the OPENED executor runs, a good while after the above
#

2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED


2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED


2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED


2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78

2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807

# Now we have a region online in Master's memory but its not out in .META. nor in the FS.
# The balancer runs

2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666

2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)

2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78

2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734

2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.

2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78

2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close

{code}

Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "Benoit Sigoure (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008574#comment-13008574 ] 

Benoit Sigoure commented on HBASE-3671:
---------------------------------------

+1 too, thanks for the quick turnaround guys!

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3671:
-------------------------

    Attachment: 3671.txt

Small fix.  Please review Jon/JD

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008865#comment-13008865 ] 

stack commented on HBASE-3671:
------------------------------

@Ted You are right.  Thanks for spotting this.  Let me fix.

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3671:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thanks reviews B, Ted, and J-D.

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3671:
-------------------------

    Affects Version/s: 0.90.2

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008863#comment-13008863 ] 

Ted Yu commented on HBASE-3671:
-------------------------------

Looks like there is inconsistency between javadoc and log message.
For isRegionInTransition():
{code}
   * @return Returns null if passed region is not in transition else the current
{code}
But the change in OpenedRegionHandler says:
{code}
    if (this.assignmentManager.isRegionInTransition(regionInfo) == null) {
      this.assignmentManager.regionOnline(regionInfo, serverInfo);
    } else {
      LOG.warn("Skipping the onlining of " + regionInfo.getRegionNameAsString() +
        " because regions is NOT in RIT -- presuming this is because it SPLIT");
    }
{code}

BTW TestMultipleTimestamps starts to fail after this change.

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3671:
-------------------------

        Assignee: stack
    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

Thanks for reviews Ted and J-D.  Will commit soon w/ Ted suggestion.

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008889#comment-13008889 ] 

Hudson commented on HBASE-3671:
-------------------------------

Integrated in HBase-TRUNK #1798 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1798/])
    

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008571#comment-13008571 ] 

Jean-Daniel Cryans commented on HBASE-3671:
-------------------------------------------

+1 after the typo Ted reported and if it passes all the tests.

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008567#comment-13008567 ] 

Ted Yu commented on HBASE-3671:
-------------------------------

Minor comment:
{code}
+      LOG.warn("Skipping the onining of " + regionInfo.getRegionNameAsString() +
+        " because regions is NOT in RIT -- presuming this is because it SPLIT");
{code}
should be:
{code}
+      LOG.warn("Skipping the onlining of " + regionInfo.getRegionNameAsString() +
+        " because region is NOT in RIT -- presuming this is because it SPLIT");
{code}


> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira