You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Pedro Gandola <pe...@gmail.com> on 2016/01/05 23:18:34 UTC

Can phoenix local indexes create a deadlock after an HBase full restart?

Hi Guys,

I have been testing out the Phoenix Local Indexes and I'm facing an issue
after restart the entire HBase cluster.

*Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster contains
10 machines and the main table contains 300 pre-split regions which implies
300 regions on local index table as well and to configure Phoenix I
followed thistutorial
<http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring-hbase-for-phoenix.html>
.

When I start a fresh cluster everything is just fine, the local index is
created and I can insert data and query it using the proper indexes. The
problem comes when I perform a full restart of the cluster to update some
configurations in that moment I'm not able to restart the cluster anymore.
I should do a proper rolling restart but it looks that Ambari is not doing
it in some situations.

Most of the servers are throwing exceptions like:

INFO  [htable-pool7-t1] client.AsyncProcess: #5,
> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last
> exception: org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region
> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e.
> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>  on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null,
> retrying after=20001ms, replay=1ops
> INFO
>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1]
> client.AsyncProcess: #3, waiting for 2  actions to finish
> INFO
>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2]
> client.AsyncProcess: #4, waiting for 2  actions to finish


It looks that they are getting into a state where some region servers are
waiting for other regions that are not available yet in other servers.

On HBase UI I can see servers stuck on this messages:

*Description:* Replaying edits from
> hdfs://.../recovered.edits/0000000000000464197
> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins,
> 45sec ago)


Another interesting thing that I noticed is the *empty coprocessor list* for
the servers that are stuck with 0 regions assigned.

HBase master goes down after logging some of these messages:

GeneralBulkAssigner: Failed bulking assigning N regions


I was able to perform full restarts before start using local indexes and
everything worked fine. This can probably be a misconfiguration from my
side but I have checked different properties and approaches to restart the
cluster and I'm unable to do it.

My understanding about local indexes on phoenix (please correct me if I'm
wrong) is that they are normal HBase tables and phoenix places the regions
to ensure the proper data locality. Is the data locality fully maintained
when we lose N region servers and/or the regions are moved?

Any insights would be very helpful.

Thank you
Cheers
Pedro

Re: Can phoenix local indexes create a deadlock after an HBase full restart?

Posted by Artem Ervits <ar...@gmail.com>.
this was answered in this thread
https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html

On Wed, Jan 6, 2016 at 10:16 AM, Pedro Gandola <pe...@gmail.com>
wrote:

> Hi Guys,
>
> The issue is a deadlock but it's not related with phoenix and it can be
> resolved increasing the number of threads responsible for opening the
> regions.
>
> <property>
>>  <name>hbase.regionserver.executor.openregion.threads</name>
>>  <value>100</value>
>> </property>
>
>
> Got help from here
> <https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html>
> .
>
> Thanks
> Cheers
> Pedro
>
> On Tue, Jan 5, 2016 at 10:18 PM, Pedro Gandola <pe...@gmail.com>
> wrote:
>
>> Hi Guys,
>>
>> I have been testing out the Phoenix Local Indexes and I'm facing an issue
>> after restart the entire HBase cluster.
>>
>> *Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster
>> contains 10 machines and the main table contains 300 pre-split regions
>> which implies 300 regions on local index table as well and to configure
>> Phoenix I followed thistutorial
>> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring-hbase-for-phoenix.html>
>> .
>>
>> When I start a fresh cluster everything is just fine, the local index is
>> created and I can insert data and query it using the proper indexes. The
>> problem comes when I perform a full restart of the cluster to update some
>> configurations in that moment I'm not able to restart the cluster anymore.
>> I should do a proper rolling restart but it looks that Ambari is not doing
>> it in some situations.
>>
>> Most of the servers are throwing exceptions like:
>>
>> INFO  [htable-pool7-t1] client.AsyncProcess: #5,
>>> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last
>>> exception: org.apache.hadoop.hbase.NotServingRegionException:
>>> org.apache.hadoop.hbase.NotServingRegionException: Region
>>> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e.
>>> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
>>> at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> at java.lang.Thread.run(Thread.java:745)
>>>  on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started
>>> null, retrying after=20001ms, replay=1ops
>>> INFO
>>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1]
>>> client.AsyncProcess: #3, waiting for 2  actions to finish
>>> INFO
>>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2]
>>> client.AsyncProcess: #4, waiting for 2  actions to finish
>>
>>
>> It looks that they are getting into a state where some region servers are
>> waiting for other regions that are not available yet in other servers.
>>
>> On HBase UI I can see servers stuck on this messages:
>>
>> *Description:* Replaying edits from
>>> hdfs://.../recovered.edits/0000000000000464197
>>> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins,
>>> 45sec ago)
>>
>>
>> Another interesting thing that I noticed is the *empty coprocessor list* for
>> the servers that are stuck with 0 regions assigned.
>>
>> HBase master goes down after logging some of these messages:
>>
>> GeneralBulkAssigner: Failed bulking assigning N regions
>>
>>
>> I was able to perform full restarts before start using local indexes and
>> everything worked fine. This can probably be a misconfiguration from my
>> side but I have checked different properties and approaches to restart the
>> cluster and I'm unable to do it.
>>
>> My understanding about local indexes on phoenix (please correct me if I'm
>> wrong) is that they are normal HBase tables and phoenix places the regions
>> to ensure the proper data locality. Is the data locality fully maintained
>> when we lose N region servers and/or the regions are moved?
>>
>> Any insights would be very helpful.
>>
>> Thank you
>> Cheers
>> Pedro
>>
>
>

Re: Can phoenix local indexes create a deadlock after an HBase full restart?

Posted by Pedro Gandola <pe...@gmail.com>.
Hi Guys,

The issue is a deadlock but it's not related with phoenix and it can be
resolved increasing the number of threads responsible for opening the
regions.

<property>
>  <name>hbase.regionserver.executor.openregion.threads</name>
>  <value>100</value>
> </property>


Got help from here
<https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html>
.

Thanks
Cheers
Pedro

On Tue, Jan 5, 2016 at 10:18 PM, Pedro Gandola <pe...@gmail.com>
wrote:

> Hi Guys,
>
> I have been testing out the Phoenix Local Indexes and I'm facing an issue
> after restart the entire HBase cluster.
>
> *Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster
> contains 10 machines and the main table contains 300 pre-split regions
> which implies 300 regions on local index table as well and to configure
> Phoenix I followed thistutorial
> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring-hbase-for-phoenix.html>
> .
>
> When I start a fresh cluster everything is just fine, the local index is
> created and I can insert data and query it using the proper indexes. The
> problem comes when I perform a full restart of the cluster to update some
> configurations in that moment I'm not able to restart the cluster anymore.
> I should do a proper rolling restart but it looks that Ambari is not doing
> it in some situations.
>
> Most of the servers are throwing exceptions like:
>
> INFO  [htable-pool7-t1] client.AsyncProcess: #5,
>> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last
>> exception: org.apache.hadoop.hbase.NotServingRegionException:
>> org.apache.hadoop.hbase.NotServingRegionException: Region
>> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e.
>> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:745)
>>  on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null,
>> retrying after=20001ms, replay=1ops
>> INFO
>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1]
>> client.AsyncProcess: #3, waiting for 2  actions to finish
>> INFO
>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2]
>> client.AsyncProcess: #4, waiting for 2  actions to finish
>
>
> It looks that they are getting into a state where some region servers are
> waiting for other regions that are not available yet in other servers.
>
> On HBase UI I can see servers stuck on this messages:
>
> *Description:* Replaying edits from
>> hdfs://.../recovered.edits/0000000000000464197
>> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins,
>> 45sec ago)
>
>
> Another interesting thing that I noticed is the *empty coprocessor list* for
> the servers that are stuck with 0 regions assigned.
>
> HBase master goes down after logging some of these messages:
>
> GeneralBulkAssigner: Failed bulking assigning N regions
>
>
> I was able to perform full restarts before start using local indexes and
> everything worked fine. This can probably be a misconfiguration from my
> side but I have checked different properties and approaches to restart the
> cluster and I'm unable to do it.
>
> My understanding about local indexes on phoenix (please correct me if I'm
> wrong) is that they are normal HBase tables and phoenix places the regions
> to ensure the proper data locality. Is the data locality fully maintained
> when we lose N region servers and/or the regions are moved?
>
> Any insights would be very helpful.
>
> Thank you
> Cheers
> Pedro
>