You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/03/24 00:26:00 UTC

[jira] [Assigned] (IMPALA-9549) Impalad startup fails to wait for catalogd to startup when using local catalog

     [ https://issues.apache.org/jira/browse/IMPALA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell reassigned IMPALA-9549:
-------------------------------------

    Assignee: Joe McDonnell

> Impalad startup fails to wait for catalogd to startup when using local catalog
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-9549
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9549
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> Since Impala coordinators and executors may be starting up at the same time as the catalogd, they should be tolerant of delays in the catalogd starting up. When using local catalog (use_local_catalog=true), the Impalads fail with the following error if the catalogd startup is delayed:
> {noformat}
> I0323 14:22:03.151849 29565 jni-util.cc:288] org.apache.impala.catalog.local.LocalCatalogException: Unable to load database names
> I0323 14:22:03.151849 29565 jni-util.cc:288] org.apache.impala.catalog.local.LocalCatalogException: Unable to load database names
>  at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94)
>  at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83)
>  at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753)
>  at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220)
> Caused by: org.apache.thrift.TException: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577)
>  at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92)
>  ... 3 more
> Caused by: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)
>  at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
>  at org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440)
>  at org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380)
>  ... 9 more
> I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to load database names
> CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused){noformat}
> What happens is that the ImpalaServer constructor calls ImpalaServer::UpdateCatalogMetrics() ([https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452] ). UpdateCatalogMetrics() is maintaining two metrics that track the number of databases and the number of tables. This ends up calling org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs() ([https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83] ). loadDbs() requires a connection to catalogd and will fail if it cannot connect.
> Importantly, this all happens before waiting for the catalogd to start up in the regular ImpalaServer::Start():
> {code:java}
> if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog();
> {code}
>  
> In the old catalog implementation (use_local_catalog=false), the getDbs() call on the catalog returns whatever values it has, and it does not try to contact the catalogd. This is why the regular case does not see this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org