You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2020/12/27 18:09:00 UTC

[jira] [Commented] (PHOENIX-6104) SplitSystemCatalogIT tests very unstable with Hbase 2.3

    [ https://issues.apache.org/jira/browse/PHOENIX-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255293#comment-17255293 ] 

Viraj Jasani commented on PHOENIX-6104:
---------------------------------------

[~stoty] I spent some time with this one but didn't realize that this Jira was already created.

I believe we are not splitting SYSTEM.CATALOG synchronously with correct strategy. What we are doing is
{code:java}
admin.split(fullTableName, splitPoint);
// make sure the split finishes (there's no synchronous splitting before HBase 2.x)
admin.disableTable(fullTableName);
admin.enableTable(fullTableName);

{code}
With HBase 2.3, we try to split the table asynchronously and when SplitTableProcedure is actually getting executed, we soon ask Admin to disable table and this seems problematic, causing NPE while retrieving RegionNode's location while unassigning the region:
{code:java}
2020-12-26 14:17:18,043 ERROR [PEWorker-13] org.apache.hadoop.hbase.procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: pid=125, ppid=119, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE, locked=true; TransitRegionStateProcedure table=SYSTEM.CATALOG, region=62da70c1cc98a8e5e0dd93cd7abce3a8, UNASSIGN
java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
	at org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:742)
	at org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:777)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionClosing(AssignmentManager.java:1807)
	at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.closeRegion(TransitRegionStateProcedure.java:267)

{code}
I was thinking, instead of disabling and enabling SYSTEM.CATALOG, we should rather wait for table to be split. For instance, the way we do in 

TableSnapshotReadsMapReduceIT.splitTableSync(), maybe we can make it move to BaseTest. Thought?

> SplitSystemCatalogIT tests very unstable with Hbase 2.3
> -------------------------------------------------------
>
>                 Key: PHOENIX-6104
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6104
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.1.0
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Major
>         Attachments: 6104-testouput.log
>
>
> The failure is in the test preparation code, where we split the system catalog table, and it seems to be a HBase issue, rather than a Phoenix one, but we need to track the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)