You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marc Hoppins <ma...@eset.sk> on 2021/02/19 11:22:21 UTC
Region in Transition
Hi all,
The RIT message shows the following:
Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }
HBASE master UI->Table details
Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com
So, is the table hosted on server hbase25 and being moved TO hbase18?
For some reason the table is not enabled at this time.
Table hds2_md5
Table Attributes
Attribute Name
Value
Description
Enabled
false
Is the table enabled
Compaction
NONE
Is the table compacting
The table has be online to perform these kinds of moves, yes? A RIT is not going to occur if the table is disabled, surely.
There was a network issue where net traffic went up on some paths as other paths went down.
So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.
Or was the table disabled with a view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment). Table has been offline for (at least) one day.
One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.
Thanks in advance.
RE: Region in Transition
Posted by Marc Hoppins <ma...@eset.sk>.
Sorry, it seems my cut/paste omitted the PID for most of these. The PID is the initial disabletableprocedure (73587).
Is there a sane method to kill child procedure IDs, then parent ID?
-----Original Message-----
From: Marc Hoppins <ma...@eset.sk>
Sent: Monday, February 22, 2021 12:16 PM
To: user@hbase.apache.org
Subject: RE: Region in Transition
EXTERNAL
Hi all,
Further:
Table 'hds2_md5' is disabled. There exists (in the lock/procedures):
73587 WAITING hbase DisableTableProcedure table=hds2_md5
73827 73587 WAITING_TIMEOUT hbase UnassignProcedure table=hds2_md5
73937 RUNNABLE jumbo AssignProcedure table=hds2_md5
73938 RUNNABLE jumbo AssignProcedure table=hds2_md5
73949 RUNNABLE jumbo EnableTableProcedure table=hds2_md5
78370 RUNNABLE jumbo AssignProcedure table=hds2_md5
78371 RUNNABLE jumbo AssignProcedure table=hds2_md5
78372 RUNNABLE jumbo AssignProcedure table=hds2_md5
87386 RUNNABLE jumbo EnableTableProcedure table=hds2_md5
123914 RUNNABLE hbase EnableTableProcedure table=hds2_md5
And then a whole bunch of these:
73588 73587 SUCCESS hbase UnassignProcedure table=hds2_md5
I am informed that the table can remain disabled after.
What is the method to fix these issues? I have built a 'operator-tools1.0.0' jar and dropped it in place on the active master.
Thanks again
Marc
-----Original Message-----
From: Marc Hoppins <ma...@eset.sk>
Sent: Friday, February 19, 2021 12:22 PM
To: user@hbase.apache.org
Subject: Region in Transition
EXTERNAL
Hi all,
The RIT message shows the following:
Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }
HBASE master UI->Table details
Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com
So, is the table hosted on server hbase25 and being moved TO hbase18?
For some reason the table is not enabled at this time.
Table hds2_md5
Table Attributes
Attribute Name
Value
Description
Enabled
false
Is the table enabled
Compaction
NONE
Is the table compacting
The table has be online to perform these kinds of moves, yes? A RIT is not going to occur if the table is disabled, surely.
There was a network issue where net traffic went up on some paths as other paths went down.
So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.
Or was the table disabled with a view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment). Table has been offline for (at least) one day.
One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.
Thanks in advance.
RE: Region in Transition
Posted by Marc Hoppins <ma...@eset.sk>.
Hi all,
Further:
Table 'hds2_md5' is disabled. There exists (in the lock/procedures):
73587 WAITING hbase DisableTableProcedure table=hds2_md5
73827 73587 WAITING_TIMEOUT hbase UnassignProcedure table=hds2_md5
73937 RUNNABLE jumbo AssignProcedure table=hds2_md5
73938 RUNNABLE jumbo AssignProcedure table=hds2_md5
73949 RUNNABLE jumbo EnableTableProcedure table=hds2_md5
78370 RUNNABLE jumbo AssignProcedure table=hds2_md5
78371 RUNNABLE jumbo AssignProcedure table=hds2_md5
78372 RUNNABLE jumbo AssignProcedure table=hds2_md5
87386 RUNNABLE jumbo EnableTableProcedure table=hds2_md5
123914 RUNNABLE hbase EnableTableProcedure table=hds2_md5
And then a whole bunch of these:
73588 73587 SUCCESS hbase UnassignProcedure table=hds2_md5
I am informed that the table can remain disabled after.
What is the method to fix these issues? I have built a 'operator-tools1.0.0' jar and dropped it in place on the active master.
Thanks again
Marc
-----Original Message-----
From: Marc Hoppins <ma...@eset.sk>
Sent: Friday, February 19, 2021 12:22 PM
To: user@hbase.apache.org
Subject: Region in Transition
EXTERNAL
Hi all,
The RIT message shows the following:
Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }
HBASE master UI->Table details
Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com
So, is the table hosted on server hbase25 and being moved TO hbase18?
For some reason the table is not enabled at this time.
Table hds2_md5
Table Attributes
Attribute Name
Value
Description
Enabled
false
Is the table enabled
Compaction
NONE
Is the table compacting
The table has be online to perform these kinds of moves, yes? A RIT is not going to occur if the table is disabled, surely.
There was a network issue where net traffic went up on some paths as other paths went down.
So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.
Or was the table disabled with a view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment). Table has been offline for (at least) one day.
One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.
Thanks in advance.