You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/03/18 18:44:30 UTC

[jira] Updated: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing

     [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3671:
-------------------------

    Attachment: 3671.txt

Small fix.  Please review Jon/JD

> Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78 and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.: Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2. from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.', STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07', ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u', ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2., qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2., qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78; data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1 Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78., src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1 Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. (offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78. which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira