You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Marc Hoppins <ma...@eset.sk> on 2021/02/19 11:22:21 UTC

Region in Transition

Hi all,

The RIT message shows the following:

Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }

HBASE master UI->Table details

Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com

So, is the table hosted on server hbase25 and being moved TO hbase18?

For some reason the table is not enabled at this time.

Table hds2_md5
Table Attributes
Attribute Name

Value

Description

Enabled

false

Is the table enabled

Compaction

NONE

Is the table compacting


The table has be online to perform these kinds of moves, yes?  A RIT is not going to occur if the table is disabled, surely.

There was a network issue where net traffic went up on some paths as other paths went down.

So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.

Or was the table disabled with a  view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment).  Table has been offline for (at least) one day.

One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.

Thanks in advance.

RE: Region in Transition

Posted by Marc Hoppins <ma...@eset.sk>.

Sorry, it seems my cut/paste omitted the PID for most of these.  The PID is the initial disabletableprocedure (73587).

Is there a sane method to kill child procedure IDs, then parent ID?

-----Original Message-----
From: Marc Hoppins <ma...@eset.sk> 
Sent: Monday, February 22, 2021 12:16 PM
To: user@hbase.apache.org
Subject: RE: Region in Transition

EXTERNAL

Hi all,

Further:

Table 'hds2_md5' is disabled. There exists  (in the lock/procedures):

73587           WAITING         hbase   DisableTableProcedure table=hds2_md5
73827   73587   WAITING_TIMEOUT         hbase   UnassignProcedure table=hds2_md5
73937           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
73938           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
73949           RUNNABLE        jumbo   EnableTableProcedure table=hds2_md5
78370           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
78371           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
78372           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
87386           RUNNABLE        jumbo   EnableTableProcedure table=hds2_md5
123914          RUNNABLE        hbase   EnableTableProcedure table=hds2_md5

And then a whole bunch of these:
73588   73587   SUCCESS         hbase   UnassignProcedure table=hds2_md5

I am informed that the table can remain disabled after.
What is the method to fix these issues?  I have built a 'operator-tools1.0.0' jar and dropped it in place on the active master.

Thanks again

Marc

-----Original Message-----
From: Marc Hoppins <ma...@eset.sk>
Sent: Friday, February 19, 2021 12:22 PM
To: user@hbase.apache.org
Subject: Region in Transition

EXTERNAL

Hi all,

The RIT message shows the following:

Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }

HBASE master UI->Table details

Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com

So, is the table hosted on server hbase25 and being moved TO hbase18?

For some reason the table is not enabled at this time.

Table hds2_md5
Table Attributes
Attribute Name

Value

Description

Enabled

false

Is the table enabled

Compaction

NONE

Is the table compacting

The table has be online to perform these kinds of moves, yes?  A RIT is not going to occur if the table is disabled, surely.

There was a network issue where net traffic went up on some paths as other paths went down.

So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.

Or was the table disabled with a  view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment).  Table has been offline for (at least) one day.

One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.

Thanks in advance.

RE: Region in Transition

Posted by Marc Hoppins <ma...@eset.sk>.

Hi all,

Further: 

Table 'hds2_md5' is disabled. There exists  (in the lock/procedures):

73587 		WAITING 	hbase 	DisableTableProcedure table=hds2_md5
73827 	73587 	WAITING_TIMEOUT 	hbase 	UnassignProcedure table=hds2_md5
73937 		RUNNABLE 	jumbo 	AssignProcedure table=hds2_md5
73938 		RUNNABLE 	jumbo 	AssignProcedure table=hds2_md5
73949 		RUNNABLE 	jumbo 	EnableTableProcedure table=hds2_md5
78370 		RUNNABLE 	jumbo 	AssignProcedure table=hds2_md5
78371 		RUNNABLE 	jumbo 	AssignProcedure table=hds2_md5
78372 		RUNNABLE 	jumbo 	AssignProcedure table=hds2_md5
87386 		RUNNABLE 	jumbo 	EnableTableProcedure table=hds2_md5
123914 		RUNNABLE 	hbase 	EnableTableProcedure table=hds2_md5

And then a whole bunch of these:
73588 	73587 	SUCCESS 	hbase 	UnassignProcedure table=hds2_md5

I am informed that the table can remain disabled after.
What is the method to fix these issues?  I have built a 'operator-tools1.0.0' jar and dropped it in place on the active master.

Thanks again

Marc

-----Original Message-----
From: Marc Hoppins <ma...@eset.sk> 
Sent: Friday, February 19, 2021 12:22 PM
To: user@hbase.apache.org
Subject: Region in Transition

EXTERNAL

Hi all,

The RIT message shows the following:

Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT', OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec, server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06 CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==', endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0' }, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020', startCode => '1604475904456' }, attempt => '179' } ] }

HBASE master UI->Table details

Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on region server ba-hbase18.jumbo.hq.com

So, is the table hosted on server hbase25 and being moved TO hbase18?

For some reason the table is not enabled at this time.

Table hds2_md5
Table Attributes
Attribute Name

Value

Description

Enabled

false

Is the table enabled

Compaction

NONE

Is the table compacting


The table has be online to perform these kinds of moves, yes?  A RIT is not going to occur if the table is disabled, surely.

There was a network issue where net traffic went up on some paths as other paths went down.

So one question could be: was the table taken offline during this unassign - but then with more than 30000 regions it is likely that other assign/unassigns were being carried out on this and other tables.

Or was the table disabled with a  view to performing some fix on this RIT. (currently, data 'owners' are unavailable for comment).  Table has been offline for (at least) one day.

One of the techies stopped the regionserver instance on the hbase25 mode to try and force some movement.

Thanks in advance.