You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/05 00:20:00 UTC

[jira] [Comment Edited] (IMPALA-7665) Bringing up stopped statestore causes queries to fail

    [ https://issues.apache.org/jira/browse/IMPALA-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639086#comment-16639086 ] 

Tim Armstrong edited comment on IMPALA-7665 at 10/5/18 12:19 AM:
-----------------------------------------------------------------

I think the issue is in ImpalaServer::MembershipCallback() where we detect the set of impala daemons doesn't distinguish between impala daemons removed from an existing statestore topic and impala daemons that have not yet been added to the new statestore topic on the fresh statestore.

I'm not sure exactly what the solution is. Probably we need to avoid cancelling queries based on the initial statestore updates from a fresh statestore. We still need to cancel queries if the other impalad doesn't appear in the statestore membership after some interval so we can distinguish between impalads that went away and impalads that haven't registered yet.



was (Author: tarmstrong):
I think the issue is in ImpalaServer::MembershipCallback() where we detect the set of impala daemons doesn't distinguish between impala daemons removed from an existing statestore topic and impala daemons that have not yet been added to the new statestore topic on the fresh statestore.

I'm not sure exactly what the solution is. Probably we need to avoid cancelling queries based on the initial statestore updates from a fresh statestore. We still need to cancel queries if the other impalad doesn't appear in the statestore membership after some interval so we can distinguish between impalads that went away and impalads that haven't registered yet.

We may want to wait until IMPALA-2990 fixed in case this leads to more queries getting stuck the the IMPALA-2990 state.

> Bringing up stopped statestore causes queries to fail
> -----------------------------------------------------
>
>                 Key: IMPALA-7665
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7665
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: statestore
>
> I can reproduce this by running a long-running query then cycling the statestore:
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh -q "select distinct * from tpch10_parquet.lineitem"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build c486fb9ea4330e1008fa9b7ceaa60492e43ee120)
> Query: select distinct * from tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 17:06:48 (Coordinator: http://tarmstrong-box:25000)
> {noformat}
> If I kill the statestore, the query runs fine, but if I start up the statestore again, it fails.
> {noformat}
> # In one terminal, start up the statestore
> $ /home/tarmstrong/Impala/incubator-impala/be/build/latest/statestore/statestored -log_filename=statestored -log_dir=/home/tarmstrong/Impala/incubator-impala/logs/cluster -v=1 -logbufsecs=5 -max_log_files=10
> # The running query then fails
> WARNINGS: Failed due to unreachable impalad(s): tarmstrong-box:22001, tarmstrong-box:22002
> {noformat}
> Note that I've seen different subsets impalads reported as failed, e.g. "Failed due to unreachable impalad(s): tarmstrong-box:22001"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org