You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/11/17 12:49:00 UTC
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

    [ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690533#comment-16690533 ] 

Duo Zhang commented on HBASE-21490:
-----------------------------------

OK, the root cause is a bug in RecoverStandByProcedure, there is a NPE when loading it and then causes the master down. But after two times of restarts, the file contains the procedures is deleted.

{noformat}
2018-11-16,20:43:37,454 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.33	cmd=create	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   perm=hbase_tst:supergroup:rw-r-----	proto=rpc
2018-11-16,21:05:58,652 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=open	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:05:58,747 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=open	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,196 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=open	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,305 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=open	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,669 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=rename	src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   dst=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log	perm=hbase_tst:supergroup:rw-r-----	proto=rpc
2018-11-16,21:07:12,776 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true	ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)	ip=/10.132.16.34	cmd=delete	src=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log	
{noformat}

Let me check what is going on here...

> WALProcedure may remove proc wal files still with active procedures
> -------------------------------------------------------------------
>
>                 Key: HBASE-21490
>                 URL: https://issues.apache.org/jira/browse/HBASE-21490
>             Project: HBase
>          Issue Type: Sub-task
>          Components: proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>
> It happens for me several times. After master restart, all the procedures are gone.
> And the proc wal files were deleted before restarting, I see this in the master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all state logs with ID less than 184, since all the active procedures are in the latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000184.log to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)