You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Neelesh <ne...@gmail.com> on 2016/12/05 22:35:40 UTC

Global Indexes and impact on availability

Hello,
  When a region server is under stress (hotspotting, or large replication,
call queue sizes hitting the limit, other processes competing with HBase
etc), we experience latency spikes for all regions hosted by that region
server.  This is somewhat expected in the plain HBase world.

However, with a phoenix global index, this service deterioration seems to
propagate to a lot more region servers, since the affected RS hosts some
index regions. The actual data regions are on another RS and latencies on
that RS spike because it cannot complete the index update calls quickly.
And that second RS now causes issues on yet another one and so on.

We've seen this happen on our cluster, and how we deal with this is by
"fixing" the original RS - split regions/restart/move around regions,
depending on what the problem is.

Has any one experienced this issue? It feels like antithetical behavior for
a distributed system. Cluster breaking down for the the very reasons its
supposed to protect against.

Love to hear the thoughts of Phoenix community on this

Re: Global Indexes and impact on availability

Posted by Neelesh <ne...@gmail.com>.

Local indexes would indeed solve this problem, at the cost of some penalty
at read-time.  Unfortunately, our vendor distribution (HortonWorks) still
does not have all the bug fixes required for local indexes to work in a
production setting. They consider local indexes to be still in beta and are
explicit about not using local indexes yet.

I was interested in seeing if anyone in the community has experienced
similar issues around global indexes

On Mon, Dec 5, 2016 at 2:39 PM, James Taylor <ja...@apache.org> wrote:

> Have you tried local indexes?
>
> On Mon, Dec 5, 2016 at 2:35 PM Neelesh <ne...@gmail.com> wrote:
>
>> Hello,
>>   When a region server is under stress (hotspotting, or large
>> replication, call queue sizes hitting the limit, other processes competing
>> with HBase etc), we experience latency spikes for all regions hosted by
>> that region server.  This is somewhat expected in the plain HBase world.
>>
>> However, with a phoenix global index, this service deterioration seems to
>> propagate to a lot more region servers, since the affected RS hosts some
>> index regions. The actual data regions are on another RS and latencies on
>> that RS spike because it cannot complete the index update calls quickly.
>> And that second RS now causes issues on yet another one and so on.
>>
>> We've seen this happen on our cluster, and how we deal with this is by
>> "fixing" the original RS - split regions/restart/move around regions,
>> depending on what the problem is.
>>
>> Has any one experienced this issue? It feels like antithetical behavior
>> for a distributed system. Cluster breaking down for the the very reasons
>> its supposed to protect against.
>>
>> Love to hear the thoughts of Phoenix community on this
>>
>

Re: Global Indexes and impact on availability

Posted by James Taylor <ja...@apache.org>.

Have you tried local indexes?

On Mon, Dec 5, 2016 at 2:35 PM Neelesh <ne...@gmail.com> wrote:

> Hello,
>   When a region server is under stress (hotspotting, or large replication,
> call queue sizes hitting the limit, other processes competing with HBase
> etc), we experience latency spikes for all regions hosted by that region
> server.  This is somewhat expected in the plain HBase world.
>
> However, with a phoenix global index, this service deterioration seems to
> propagate to a lot more region servers, since the affected RS hosts some
> index regions. The actual data regions are on another RS and latencies on
> that RS spike because it cannot complete the index update calls quickly.
> And that second RS now causes issues on yet another one and so on.
>
> We've seen this happen on our cluster, and how we deal with this is by
> "fixing" the original RS - split regions/restart/move around regions,
> depending on what the problem is.
>
> Has any one experienced this issue? It feels like antithetical behavior
> for a distributed system. Cluster breaking down for the the very reasons
> its supposed to protect against.
>
> Love to hear the thoughts of Phoenix community on this
>