You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@impala.apache.org by Baptiste Mille-Mathias <ba...@gmail.com> on 2018/07/26 12:21:56 UTC

Rolling restart of Impala cluster, API for setting coordinator and executor roles

Hello,

In operation I face having to stop a node or even to perform a
rolling-restart over a whole cluster to apply a system patch or an update
of configuration. The cluster is running Impala 2.10 and running behind
load-balancing (haproxy).
The problem is when an Impala server is stopped (being coordinator or
executor) all queries it is handling are killed and clients will receive an
error, which is quite bad, therefore when you do a rolling-restart that
will create as many interruption as you have nodes.

I've looked in a way to remove both roles dynamically in order to move the
nodes properly out of the cluster before really stopping the service, so no
service interruption is experienced but I did not see such API (only saw
this possible in configuration file).

Is it possible ? if not how do you handle this scenario.

thanks for your advice.
-- 
Les gens heureux ne sont pas pressés

Re: Rolling restart of Impala cluster, API for setting coordinator and executor roles

Posted by Baptiste Mille-Mathias <ba...@gmail.com>.

Hello Quanlong,

thanks for the pointer, and the strategy you exposed.

Regards

Le ven. 27 juil. 2018 à 15:52, Quanlong Huang <hu...@126.com> a
écrit :

> Hi Baptiste,
>
> There's an on-going work that enhances the impalad to be able to shut down
> gracefully: https://gerrit.cloudera.org/c/10744/ Thanks to Tim's efforts
> on this and hope it can be merged soon (so the patch can be more easier to
> merge into the 2.x branch).
>
> We've faced the similar scenario before. One idea to mitigate the service
> interruption is to set up another impala cluster as a temporary backup.
> Then switch the load-balancing to the backup cluster and perform long-time
> maintenance on the origin cluster. Finally, switch back the load-balancing
> to the origin cluster after all is done.
> Hope this helps.
>
> Regards,
> Quanlong
> --
> Quanlong Huang
> Software Developer, Hulu
>
> At 2018-07-26 20:21:56, "Baptiste Mille-Mathias" <
> baptiste.millemathias@gmail.com> wrote:
>
> Hello,
>
> In operation I face having to stop a node or even to perform a
> rolling-restart over a whole cluster to apply a system patch or an update
> of configuration. The cluster is running Impala 2.10 and running behind
> load-balancing (haproxy).
> The problem is when an Impala server is stopped (being coordinator or
> executor) all queries it is handling are killed and clients will receive an
> error, which is quite bad, therefore when you do a rolling-restart that
> will create as many interruption as you have nodes.
>
> I've looked in a way to remove both roles dynamically in order to move the
> nodes properly out of the cluster before really stopping the service, so no
> service interruption is experienced but I did not see such API (only saw
> this possible in configuration file).
>
> Is it possible ? if not how do you handle this scenario.
>
> thanks for your advice.
> --
> Les gens heureux ne sont pas pressés
>
> --
Les gens heureux ne sont pas pressés

Re:Rolling restart of Impala cluster, API for setting coordinator and executor roles

Posted by Quanlong Huang <hu...@126.com>.

Hi Baptiste,

There's an on-going work that enhances the impalad to be able to shut down gracefully: https://gerrit.cloudera.org/c/10744/ Thanks to Tim's efforts on this and hope it can be merged soon (so the patch can be more easier to merge into the 2.x branch).

We've faced the similar scenario before. One idea to mitigate the service interruption is to set up another impala cluster as a temporary backup. Then switch the load-balancing to the backup cluster and perform long-time maintenance on the origin cluster. Finally, switch back the load-balancing to the origin cluster after all is done.
Hope this helps.

Regards,
Quanlong
--
Quanlong Huang
Software Developer, Hulu

At 2018-07-26 20:21:56, "Baptiste Mille-Mathias" <ba...@gmail.com> wrote:

Hello,

In operation I face having to stop a node or even to perform a rolling-restart over a whole cluster to apply a system patch or an update of configuration. The cluster is running Impala 2.10 and running behind load-balancing (haproxy).
The problem is when an Impala server is stopped (being coordinator or executor) all queries it is handling are killed and clients will receive an error, which is quite bad, therefore when you do a rolling-restart that will create as many interruption as you have nodes.

I've looked in a way to remove both roles dynamically in order to move the nodes properly out of the cluster before really stopping the service, so no service interruption is experienced but I did not see such API (only saw this possible in configuration file).

Is it possible ? if not how do you handle this scenario.

thanks for your advice.
--

Les gens heureux ne sont pas pressés