You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/04/07 00:15:00 UTC

[jira] [Comment Edited] (IMPALA-9610) Crashed statestore causes queries to hang

    [ https://issues.apache.org/jira/browse/IMPALA-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076767#comment-17076767 ] 

Sahil Takiar edited comment on IMPALA-9610 at 4/7/20, 12:14 AM:
----------------------------------------------------------------

Something weird happens to impala-shell as well:
{code:java}
[localhost:21000] default> select count(l_comment) from tpch.lineitem;
Query: select count(l_comment) from tpch.lineitem
Query submitted at: 2020-04-06 17:00:47 (Coordinator: http://stakiar-desktop:25000)
^C^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> quit;
^C^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> quit;

 {code}

I can't quit the impala-shell, I have to kill the process externally to exit.


was (Author: stakiar):
Something weird happens to impala-shell as well:
{code:java}
[localhost:21000] default> select count(l_comment) from tpch.lineitem;
Query: select count(l_comment) from tpch.lineitem
Query submitted at: 2020-04-06 17:00:47 (Coordinator: http://stakiar-desktop:25000)
^C^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> quit;
^C^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> ^C
[localhost:21000] default> quit;

 {code}

I can't quit the impal-shell, I have to kill the process externally to exit.

> Crashed statestore causes queries to hang
> -----------------------------------------
>
>                 Key: IMPALA-9610
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9610
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> The following steps on master cause a query to hang:
> {code:java}
> ./bin/start-impala-cluster.py
> ps aux | grep -i statestore
> kill -9 8968
> # wait until the impala coordinator has noticed that the statestore has been killed (e.g. wait for statestore-subscriber.connected to be False)
> ./bin/impala-shell.sh
> [localhost:21000] default> select count(l_comment) from tpch.lineitem;
> {code}
> The query hangs in the "CREATED" state.
> From the coordinator logs:
> {code:java}
> I0406 16:46:51.677052 10058 statestore-subscriber.cc:300] impalad@stakiar-desktop:22000: Connection with statestore lost, entering recovery mode
> I0406 16:46:51.677093 10058 statestore-subscriber.cc:304] Trying to re-register with statestore, attempt: 1
> I0406 16:46:51.677412 10058 client-cache.cc:82] ReopenClient(): re-creating client for localhost:24000
> I0406 16:46:51.677911 10058 thrift-util.cc:96] TSocket::open() connect() <Host: localhost Port: 24000>Connection refused
> I0406 16:46:51.678071 10058 thrift-client.cc:78] Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:51.678092 10058 thrift-client.cc:94] Unable to connect to localhost:24000
> I0406 16:46:51.678114 10058 client-cache.h:387] RPC client failed to connect: Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
>  rpc: N6impala27TRegisterSubscriberResponseE
> I0406 16:46:51.678135 10058 client-cache.h:314] RPC to localhost:24000 failed RPC client failed to connect: Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:51.678151 10058 client-cache.cc:174] Broken Connection, destroy client for localhost:24000
> I0406 16:46:54.678651 10058 thrift-util.cc:96] TSocket::open() connect() <Host: localhost Port: 24000>Connection refused
> I0406 16:46:54.678774 10058 thrift-client.cc:78] Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:54.678794 10058 thrift-client.cc:94] Unable to connect to localhost:24000
> ...
> {code}
> The "Unable to connect to localhost:24000" pattern continues indefinitely (presumably until the statestore comes back up).
> When the coordinator gets a query from the client, it is able to successfully parse / plan it, but hangs when trying to load metadata:
> {code:java}
> I0406 16:47:11.461125 11181 impala-server.cc:1053] 3c4fe54fcc25e7db:99c9bd7b00000000] Registered query query_id=3c4fe54fcc25e7db:99c9bd7b00000000 session_id=d8416b340cdf66b6:fc432571b1d658bb
> I0406 16:47:11.521435 11181 Frontend.java:1490] 3c4fe54fcc25e7db:99c9bd7b00000000] Analyzing query: select count(l_comment) from tpch.lineitem db: default
> I0406 16:47:11.627283 11181 FeSupport.java:317] 3c4fe54fcc25e7db:99c9bd7b00000000] Requesting prioritized load of table(s): tpch.lineitem
> ...
> I0406 16:47:31.648221 11181 StmtMetadataLoader.java:202] 3c4fe54fcc25e7db:99c9bd7b00000000] Waiting for table metadata. Waited for 10 catalog updates and 20020ms. Tables remaining: [tpch.lineitem]
> ...
> I0406 16:52:31.696449 11181 StmtMetadataLoader.java:202] 3c4fe54fcc25e7db:99c9bd7b00000000] Waiting for table metadata. Waited for 160 catalog updates and 320069ms. Tables remaining: [tpch.lineitem]
> ....
> I0406 16:52:33.697089 11181 StmtMetadataLoader.java:223] 3c4fe54fcc25e7db:99c9bd7b00000000] Re-sending prioritized load request. Waited for 160 catalog updates and 322069ms.
> I0406 16:52:33.697333 11181 FeSupport.java:317] 3c4fe54fcc25e7db:99c9bd7b00000000] Requesting prioritized load of table(s): tpch.lineitem
> {code}
> The catalogd logs show the following:
> {code:java}
> I0406 16:45:58.999001  9032 logging-support.cc:299] Old log file deleted during log rotation: /home/stakiar/Impala/logs/cluster/catalogd.stakiar-desktop.stakiar.log.ERROR.20200406-095134.8408
> I0406 16:46:51.311251  9241 statestore-subscriber.cc:300] catalog-server@stakiar-desktop:26000: Connection with statestore lost, entering recovery mode
> I0406 16:46:51.311297  9241 statestore-subscriber.cc:304] Trying to re-register with statestore, attempt: 1
> I0406 16:46:51.311861  9241 client-cache.cc:82] ReopenClient(): re-creating client for localhost:24000
> I0406 16:46:51.312443  9241 thrift-util.cc:96] TSocket::open() connect() <Host: localhost Port: 24000>Connection refused
> I0406 16:46:51.312660  9241 thrift-client.cc:78] Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:51.312695  9241 thrift-client.cc:94] Unable to connect to localhost:24000
> I0406 16:46:51.312721  9241 client-cache.h:387] RPC client failed to connect: Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
>  rpc: N6impala27TRegisterSubscriberResponseE
> I0406 16:46:51.312741  9241 client-cache.h:314] RPC to localhost:24000 failed RPC client failed to connect: Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:51.312757  9241 client-cache.cc:174] Broken Connection, destroy client for localhost:24000
> I0406 16:46:54.313377  9241 thrift-util.cc:96] TSocket::open() connect() <Host: localhost Port: 24000>Connection refused
> I0406 16:46:54.313537  9241 thrift-client.cc:78] Couldn't open transport for localhost:24000 (connect() failed: Connection refused)
> I0406 16:46:54.313561  9241 thrift-client.cc:94] Unable to connect to localhost:24000
> ...
> I0406 16:47:11.642450 13461 TableLoader.java:60] Loading metadata for: tpch.lineitem (background load)
> I0406 16:47:11.642679  9211 TableLoadingMgr.java:71] Loading metadata for table: tpch.lineitem
> I0406 16:47:11.642757  9211 TableLoadingMgr.java:73] Remaining items in queue: 0. Loads in progress: 1
> I0406 16:47:11.668300 13461 HdfsTable.java:1004] Loading metadata for table definition and all partition(s) of tpch.lineitem (background load)
> W0406 16:47:11.690783 13461 HiveConf.java:5706] HiveConf of name hive.sentry.conf.url does not exist
> I0406 16:47:11.815637 13461 Table.java:420] Load Valid Write Id List Done. Time taken: 15.826us
> I0406 16:47:11.815695 13461 HdfsTable.java:1037] Fetching partition metadata from the Metastore: tpch.lineitem
> I0406 16:47:11.822021 13461 HdfsTable.java:1044] Fetched partition metadata from the Metastore: tpch.lineitem
> I0406 16:47:11.903692 13461 HdfsTable.java:700] Loaded file and block metadata for tpch.lineitem partitions: . Time taken: 33.599ms
> I0406 16:47:11.903905 13461 TableLoader.java:103] Loaded metadata for: tpch.lineitem (261ms)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org