You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@impala.apache.org by Aleksei Maželis <ol...@gmail.com> on 2017/12/05 10:50:34 UTC

Impala's single point of failure

Hi,

Referring to the chapter of impala_faq about single point of failure at
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_faq.html#faq_ha__faq_spof
:

<quote>

> There is not a single point of failure in Impala. All Impala daemons are
> fully able to handle incoming queries. If a machine fails however, all
> queries with fragments running on that machine will fail. Because queries
> are expected to return quickly, you can just rerun the query if there is a
> failure. See Impala Concepts and Architecture for details about the Impala
> architecture.
> The longer answer: Impala must be able to connect to the Hive metastore.
> Impala aggressively caches metadata so the metastore host should have
> minimal load. Impala relies on the HDFS NameNode, and, in CDH4, you can
> configure HA for HDFS. Impala also has centralized services, known as the
> statestore and catalog services, that run on one host only. Impala
> continues to execute queries if the statestore host is down, but it will
> not get state updates. For example, if a host is added to the cluster while
> the statestore host is down, the existing instances of impalad running on
> the other hosts will not find out about this new host. Once the statestore
> process is restarted, all the information it serves is automatically
> reconstructed from all running Impala daemons.

</quote>

It appears that (despite the first sentence in the quote) the centralized
services (statestore and catalog) do represent a single point of failure.
Is it so, or am I missing something? If so, what is a workaround in case
high availability is a requirement?

Regards,
Aleksei Maželis

Re:Impala's single point of failure

Posted by Quanlong Huang <hu...@126.com>.

I have questions about this as well.

If the catalogd restart, its catalog version will start from 0. However, the catalog version in Impala daemon still not change. This cause the Impala daemons not accepting catalog updates until the catalog version match. The cluster is in an abnormal state during that time.

Thus, there's still a single point of failure in Impala. Please correct me if I'm wrong.

Thanks,
Quanlong

At 2017-12-05 18:50:34, "Aleksei Maželis" <ol...@gmail.com> wrote:

Hi,

Referring to the chapter of impala_faq about single point of failure at https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_faq.html#faq_ha__faq_spof:

<quote>
There is not a single point of failure in Impala. All Impala daemons are fully able to handle incoming queries. If a machine fails however, all queries with fragments running on that machine will fail. Because queries are expected to return quickly, you can just rerun the query if there is a failure. See Impala Concepts and Architecture for details about the Impala architecture.
The longer answer: Impala must be able to connect to the Hive metastore. Impala aggressively caches metadata so the metastore host should have minimal load. Impala relies on the HDFS NameNode, and, in CDH4, you can configure HA for HDFS. Impala also has centralized services, known as the statestore and catalog services, that run on one host only. Impala continues to execute queries if the statestore host is down, but it will not get state updates. For example, if a host is added to the cluster while the statestore host is down, the existing instances of impalad running on the other hosts will not find out about this new host. Once the statestore process is restarted, all the information it serves is automatically reconstructed from all running Impala daemons.
</quote>

It appears that (despite the first sentence in the quote) the centralized services (statestore and catalog) do represent a single point of failure. Is it so, or am I missing something? If so, what is a workaround in case high availability is a requirement?

Regards,
Aleksei Maželis