You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Hossein Zolfi <ho...@gmail.com> on 2018/12/09 16:49:37 UTC

+How can I abort pending procedures?

Hi,
I run hbase performance tools, and thousands tables have been created. And
our cluster is currently in inconsistent state (We dont know what is the
cause but we try found it), at first I try to disable/drop created tables
(1700 tables) but nothing done. list_procedure show 492 rows, and It's not
possible to abort any of them. Then, I restart hmaster service, but now, I
got infinite number of following exceptions:

2018-12-09 20:01:30,194 WARN  [MASTER_SERVER_OPERATIONS-master-4:16000-0]
master.AssignmentManager: Failed assignment of
t53889,00000000000000000007603345,1542715604227.4cc63591941dbe928663
88fbde075cac. to data-22-54,16020,1543392184445, waiting a little before
trying on the same region server try=1 of 10
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
Received OPEN for the region:t53889,0000000
0000000000007603345,1542715604227.4cc63591941dbe92866388fbde075cac. , which
we are already trying to CLOSE
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1604)
        at
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:748)

        at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown
Source)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
        at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:772)
        at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2164)
        at
org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager.java:860)
        at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

How can we stop such logs!?

Output of `list_procedures` contains something like this:

1530 DisableTableProcedure (table=t2151) FINISHED Fri Dec 07 11:32:45 +0330
2018 Sun Dec 09 20:07:59 +0330 2018

1532 DisableTableProcedure (table=t21514) FINISHED Fri Dec 07 11:42:53
+0330 2018 Sun Dec 09 20:07:27 +0330 2018

1534 DisableTableProcedure (table=t21518) FINISHED Fri Dec 07 11:53:02
+0330 2018 Sun Dec 09 20:07:57 +0330 2018

1535 DeleteTableProcedure (table=t13946) FINISHED Fri Dec 07 12:02:59 +0330
2018 Sun Dec 09 20:07:27 +0330 2018


I don't know if I remove /hbase/MasterProcWALs from hdfs will problem or
not.

Any help will be appreciated.

With best regards.

Re: +How can I abort pending procedures?

Posted by Xu Cang <xc...@salesforce.com>.
Hi Hossein,
If you are facing this issue for HBase branch-1. You could not use hbck2.
And be aware, some procedures are not abortable. The most practical
solution to your issue it to follow what Wellington mentioned above.
Removing the  "/hbase/MasterProcWALs" will remove all master procedures for
your cluster. I suggest you backing up this directory before removing. Then
you can failover active HMaster and you can retry creating tables you need.

Best,
Xu

On Mon, Dec 10, 2018 at 2:56 AM Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> Hi Hossein, for which hbase version are you facing this issue?
> Removing "/hbase/MasterProcWALs" would probably help sort the
> mentioned error, but there might be some risk of creating other
> inconsistencies, depending on which procedures are running. Does
> list_procedures command show any "running" procedure, or just list the
> finished ones?
> Em seg, 10 de dez de 2018 às 02:39, Sakthi Vel
> <sa...@gmail.com> escreveu:
> >
> > Hi Hossein,
> >
> > Aborting procedures can be dangerous (specially if the procedure is not
> > rolled back). AFAIK, you can use hbck2(apache/hbase-operator-tools) tool
> to
> > abort a procedure using the ('bypass')  option. I would like to quote the
> > official hbck2 doc here:
> >
> >  bypass [OPTIONS] <PID>...
> >    Options:
> >     -o,--override   override if procedure is running/stuck
> >     -r,--recursive  bypass parent and its children. SLOW! EXPENSIVE!
> >     -w,--lockWait   milliseconds to wait on lock before giving up;
> > default=1
> >    Pass one (or more) procedure 'pid's to skip to procedure finish.
> >    Parent of bypassed procedure will also be skipped to the finish.
> >    Entities will be left in an inconsistent state and will require
> >    manual fixup. May need Master restart to clear locks still held.
> >    Bypass fails if procedure has children. Add 'recursive' if all
> >    you have is a parent pid to finish parent and children. This
> >    is SLOW, and dangerous so use selectively. Does not always work.
> >
> > +Other members, please correct me if I am wrong.
> >
> > Sakthi
> >
> > On Sun, Dec 9, 2018 at 6:18 PM Hossein Zolfi <ho...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I run hbase performance tools, and thousands tables have been created.
> And
> > > our cluster is currently in inconsistent state (We dont know what is
> the
> > > cause but we try found it), at first I try to disable/drop created
> tables
> > > (1700 tables) but nothing done. list_procedure show 492 rows, and It's
> not
> > > possible to abort any of them. Then, I restart hmaster service, but
> now, I
> > > got infinite number of following exceptions:
> > >
> > > 2018-12-09 20:01:30,194 WARN
> [MASTER_SERVER_OPERATIONS-master-4:16000-0]
> > > master.AssignmentManager: Failed assignment of
> > > t53889,00000000000000000007603345,1542715604227.4cc63591941dbe928663
> > > 88fbde075cac. to data-22-54,16020,1543392184445, waiting a little
> before
> > > trying on the same region server try=1 of 10
> > >
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > >
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > > Received OPEN for the region:t53889,0000000
> > > 0000000000007603345,1542715604227.4cc63591941dbe92866388fbde075cac. ,
> which
> > > we are already trying to CLOSE
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1604)
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
> > >         at
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
> > >         at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> > >         at
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > >         at
> sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown
> > > Source)
> > >         at
> > >
> > >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > >         at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> > >         at
> > >
> > >
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> > >         at
> > >
> > >
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:772)
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2164)
> > >         at
> > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager.java:860)
> > >         at
> > >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> > >         at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >         at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > How can we stop such logs!?
> > >
> > > Output of `list_procedures` contains something like this:
> > >
> > > 1530 DisableTableProcedure (table=t2151) FINISHED Fri Dec 07 11:32:45
> +0330
> > > 2018 Sun Dec 09 20:07:59 +0330 2018
> > >
> > > 1532 DisableTableProcedure (table=t21514) FINISHED Fri Dec 07 11:42:53
> > > +0330 2018 Sun Dec 09 20:07:27 +0330 2018
> > >
> > > 1534 DisableTableProcedure (table=t21518) FINISHED Fri Dec 07 11:53:02
> > > +0330 2018 Sun Dec 09 20:07:57 +0330 2018
> > >
> > > 1535 DeleteTableProcedure (table=t13946) FINISHED Fri Dec 07 12:02:59
> +0330
> > > 2018 Sun Dec 09 20:07:27 +0330 2018
> > >
> > >
> > > I don't know if I remove /hbase/MasterProcWALs from hdfs will problem
> or
> > > not.
> > >
> > > Any help will be appreciated.
> > >
> > > With best regards.
> > >
>

Re: +How can I abort pending procedures?

Posted by Wellington Chevreuil <we...@gmail.com>.
Hi Hossein, for which hbase version are you facing this issue?
Removing "/hbase/MasterProcWALs" would probably help sort the
mentioned error, but there might be some risk of creating other
inconsistencies, depending on which procedures are running. Does
list_procedures command show any "running" procedure, or just list the
finished ones?
Em seg, 10 de dez de 2018 às 02:39, Sakthi Vel
<sa...@gmail.com> escreveu:
>
> Hi Hossein,
>
> Aborting procedures can be dangerous (specially if the procedure is not
> rolled back). AFAIK, you can use hbck2(apache/hbase-operator-tools) tool to
> abort a procedure using the ('bypass')  option. I would like to quote the
> official hbck2 doc here:
>
>  bypass [OPTIONS] <PID>...
>    Options:
>     -o,--override   override if procedure is running/stuck
>     -r,--recursive  bypass parent and its children. SLOW! EXPENSIVE!
>     -w,--lockWait   milliseconds to wait on lock before giving up;
> default=1
>    Pass one (or more) procedure 'pid's to skip to procedure finish.
>    Parent of bypassed procedure will also be skipped to the finish.
>    Entities will be left in an inconsistent state and will require
>    manual fixup. May need Master restart to clear locks still held.
>    Bypass fails if procedure has children. Add 'recursive' if all
>    you have is a parent pid to finish parent and children. This
>    is SLOW, and dangerous so use selectively. Does not always work.
>
> +Other members, please correct me if I am wrong.
>
> Sakthi
>
> On Sun, Dec 9, 2018 at 6:18 PM Hossein Zolfi <ho...@gmail.com>
> wrote:
>
> > Hi,
> > I run hbase performance tools, and thousands tables have been created. And
> > our cluster is currently in inconsistent state (We dont know what is the
> > cause but we try found it), at first I try to disable/drop created tables
> > (1700 tables) but nothing done. list_procedure show 492 rows, and It's not
> > possible to abort any of them. Then, I restart hmaster service, but now, I
> > got infinite number of following exceptions:
> >
> > 2018-12-09 20:01:30,194 WARN  [MASTER_SERVER_OPERATIONS-master-4:16000-0]
> > master.AssignmentManager: Failed assignment of
> > t53889,00000000000000000007603345,1542715604227.4cc63591941dbe928663
> > 88fbde075cac. to data-22-54,16020,1543392184445, waiting a little before
> > trying on the same region server try=1 of 10
> > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > Received OPEN for the region:t53889,0000000
> > 0000000000007603345,1542715604227.4cc63591941dbe92866388fbde075cac. , which
> > we are already trying to CLOSE
> >         at
> >
> > org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1604)
> >         at
> >
> > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
> >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
> >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> >         at
> > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >         at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> >         at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown
> > Source)
> >         at
> >
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> >         at
> >
> > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> >         at
> >
> > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> >         at
> >
> > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
> >         at
> >
> > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:772)
> >         at
> >
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2164)
> >         at
> >
> > org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager.java:860)
> >         at
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > How can we stop such logs!?
> >
> > Output of `list_procedures` contains something like this:
> >
> > 1530 DisableTableProcedure (table=t2151) FINISHED Fri Dec 07 11:32:45 +0330
> > 2018 Sun Dec 09 20:07:59 +0330 2018
> >
> > 1532 DisableTableProcedure (table=t21514) FINISHED Fri Dec 07 11:42:53
> > +0330 2018 Sun Dec 09 20:07:27 +0330 2018
> >
> > 1534 DisableTableProcedure (table=t21518) FINISHED Fri Dec 07 11:53:02
> > +0330 2018 Sun Dec 09 20:07:57 +0330 2018
> >
> > 1535 DeleteTableProcedure (table=t13946) FINISHED Fri Dec 07 12:02:59 +0330
> > 2018 Sun Dec 09 20:07:27 +0330 2018
> >
> >
> > I don't know if I remove /hbase/MasterProcWALs from hdfs will problem or
> > not.
> >
> > Any help will be appreciated.
> >
> > With best regards.
> >

Re: +How can I abort pending procedures?

Posted by Sakthi Vel <sa...@gmail.com>.
Hi Hossein,

Aborting procedures can be dangerous (specially if the procedure is not
rolled back). AFAIK, you can use hbck2(apache/hbase-operator-tools) tool to
abort a procedure using the ('bypass')  option. I would like to quote the
official hbck2 doc here:

 bypass [OPTIONS] <PID>...
   Options:
    -o,--override   override if procedure is running/stuck
    -r,--recursive  bypass parent and its children. SLOW! EXPENSIVE!
    -w,--lockWait   milliseconds to wait on lock before giving up;
default=1
   Pass one (or more) procedure 'pid's to skip to procedure finish.
   Parent of bypassed procedure will also be skipped to the finish.
   Entities will be left in an inconsistent state and will require
   manual fixup. May need Master restart to clear locks still held.
   Bypass fails if procedure has children. Add 'recursive' if all
   you have is a parent pid to finish parent and children. This
   is SLOW, and dangerous so use selectively. Does not always work.

+Other members, please correct me if I am wrong.

Sakthi

On Sun, Dec 9, 2018 at 6:18 PM Hossein Zolfi <ho...@gmail.com>
wrote:

> Hi,
> I run hbase performance tools, and thousands tables have been created. And
> our cluster is currently in inconsistent state (We dont know what is the
> cause but we try found it), at first I try to disable/drop created tables
> (1700 tables) but nothing done. list_procedure show 492 rows, and It's not
> possible to abort any of them. Then, I restart hmaster service, but now, I
> got infinite number of following exceptions:
>
> 2018-12-09 20:01:30,194 WARN  [MASTER_SERVER_OPERATIONS-master-4:16000-0]
> master.AssignmentManager: Failed assignment of
> t53889,00000000000000000007603345,1542715604227.4cc63591941dbe928663
> 88fbde075cac. to data-22-54,16020,1543392184445, waiting a little before
> trying on the same region server try=1 of 10
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> Received OPEN for the region:t53889,0000000
> 0000000000007603345,1542715604227.4cc63591941dbe92866388fbde075cac. , which
> we are already trying to CLOSE
>         at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1604)
>         at
>
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>         at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>         at
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>         at java.lang.Thread.run(Thread.java:748)
>
>         at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown
> Source)
>         at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at
>
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at
>
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>         at
>
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
>         at
>
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:772)
>         at
>
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2164)
>         at
>
> org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager.java:860)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> How can we stop such logs!?
>
> Output of `list_procedures` contains something like this:
>
> 1530 DisableTableProcedure (table=t2151) FINISHED Fri Dec 07 11:32:45 +0330
> 2018 Sun Dec 09 20:07:59 +0330 2018
>
> 1532 DisableTableProcedure (table=t21514) FINISHED Fri Dec 07 11:42:53
> +0330 2018 Sun Dec 09 20:07:27 +0330 2018
>
> 1534 DisableTableProcedure (table=t21518) FINISHED Fri Dec 07 11:53:02
> +0330 2018 Sun Dec 09 20:07:57 +0330 2018
>
> 1535 DeleteTableProcedure (table=t13946) FINISHED Fri Dec 07 12:02:59 +0330
> 2018 Sun Dec 09 20:07:27 +0330 2018
>
>
> I don't know if I remove /hbase/MasterProcWALs from hdfs will problem or
> not.
>
> Any help will be appreciated.
>
> With best regards.
>