You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/03/21 20:27:00 UTC

[jira] [Created] (HBASE-22078) corrupted procs in proc WAL

Sergey Shelukhin created HBASE-22078:
----------------------------------------

             Summary: corrupted procs in proc WAL
                 Key: HBASE-22078
                 URL: https://issues.apache.org/jira/browse/HBASE-22078
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Not sure what the root cause is... there are ~500 proc wal files (I actually wonder if cleanup is also blocked by this, since I see these lines on master restart, do WALs with abandoned procedures like that get deleted?).
{noformat}
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7571, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7600, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7610, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7631, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7650, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7651, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7657, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
2019-03-21 12:47:17,116 ERROR [master/...:17000:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 7683, max stack id is 7754, root procedure is Procedure(pid=66829, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
{noformat}
Followed by 
{noformat}
2019-03-20 07:37:53,751 ERROR [master/...:17000:becomeActiveMaster] procedure2.ProcedureExecutor: Corrupt pid=66829, state=WAITING:DISABLE_TABLE_ADD_REPLICATION_BARRIER, hasLock=false; DisableTableProcedure table=...
{noformat}
And 1000s of child procedures and grandchild procedures of this procedure.

I think this area needs general overview... we should have a record for the procedure durably persisted before we create any child procedures, so I'm not sure how this could happen. Actually, I also wonder why we even have separate proc WAL when HBase already has a working WAL that's more or less time tested... 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)