You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2019/08/28 19:15:00 UTC
[jira] [Updated] (IMPALA-8904) Daemons fails fast when statestore has not started up

     [ https://issues.apache.org/jira/browse/IMPALA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-8904:
----------------------------------
    Description: 
If you start the statestored and the other services at the same time, there is a race between the statestore starting and the other services trying to register with it. If the other services "win" the race, they abort startup because they can't register with the statestore.

The log looks like.
{noformat}
│ I0828 00:19:10.460000     1 statestore-subscriber.cc:219] Starting statestore subscriber                                                                                                                                                                                                ││ I0828 00:19:10.461310     1 thrift-server.cc:451] ThriftServer 'StatestoreSubscriber' started on port: 23000                                                                                                                                                                            │
│ I0828 00:19:10.461320     1 statestore-subscriber.cc:247] Registering with statestore                                                                                                                                                                                                   ││ I0828 00:19:10.461309   299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2                                                                                                                                                                                   │
│ I0828 00:19:10.462744     1 statestore-subscriber.cc:253] statestore registration unsuccessful: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterSubscriberRe ││ sponseE, send: done                                                                                                                                                                                                                                                                     │
│ E0828 00:19:10.462818     1 impalad-main.cc:90] Impalad services did not start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ ubscriberResponseE, send: done                                                                                                                                                                                                                                                          │
│ Statestore subscriber did not start up.                                                           
{noformat}

Most management systems will automatically restart failed processes, so typically the impalads will come back up and find the statestore, but the crash loop is unnecessary.

I propose that the services should retry for a while before giving up (we still want the services to fail when there genuinely isn't a statestore available).

  was:
If you start the statestored and the other services at the same time, there is a race between the statestore starting and the other services trying to register with it. If the other services "win" the race, they abort startup because they can't register with the statestore.

The log looks like.
{noformat}
│ I0828 00:19:10.460000     1 statestore-subscriber.cc:219] Starting statestore subscriber                                                                                                                                                                                                ││ I0828 00:19:10.461310     1 thrift-server.cc:451] ThriftServer 'StatestoreSubscriber' started on port: 23000                                                                                                                                                                            │
│ I0828 00:19:10.461320     1 statestore-subscriber.cc:247] Registering with statestore                                                                                                                                                                                                   ││ I0828 00:19:10.461309   299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2                                                                                                                                                                                   │
│ I0828 00:19:10.462744     1 statestore-subscriber.cc:253] statestore registration unsuccessful: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterSubscriberRe ││ sponseE, send: done                                                                                                                                                                                                                                                                     │
│ E0828 00:19:10.462818     1 impalad-main.cc:90] Impalad services did not start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ ubscriberResponseE, send: done                                                                                                                                                                                                                                                          │
│ Statestore subscriber did not start up.                                                           
{noformat}

I propose that the services should retry for a while before giving up (we still want the services to fail when there genuinely isn't a statestore available).


> Daemons fails fast when statestore has not started up
> -----------------------------------------------------
>
>                 Key: IMPALA-8904
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8904
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> If you start the statestored and the other services at the same time, there is a race between the statestore starting and the other services trying to register with it. If the other services "win" the race, they abort startup because they can't register with the statestore.
> The log looks like.
> {noformat}
> │ I0828 00:19:10.460000     1 statestore-subscriber.cc:219] Starting statestore subscriber                                                                                                                                                                                                ││ I0828 00:19:10.461310     1 thrift-server.cc:451] ThriftServer 'StatestoreSubscriber' started on port: 23000                                                                                                                                                                            │
> │ I0828 00:19:10.461320     1 statestore-subscriber.cc:247] Registering with statestore                                                                                                                                                                                                   ││ I0828 00:19:10.461309   299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2                                                                                                                                                                                   │
> │ I0828 00:19:10.462744     1 statestore-subscriber.cc:253] statestore registration unsuccessful: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterSubscriberRe ││ sponseE, send: done                                                                                                                                                                                                                                                                     │
> │ E0828 00:19:10.462818     1 impalad-main.cc:90] Impalad services did not start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ ubscriberResponseE, send: done                                                                                                                                                                                                                                                          │
> │ Statestore subscriber did not start up.                                                           
> {noformat}
> Most management systems will automatically restart failed processes, so typically the impalads will come back up and find the statestore, but the crash loop is unnecessary.
> I propose that the services should retry for a while before giving up (we still want the services to fail when there genuinely isn't a statestore available).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org