You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/21 05:22:00 UTC

[jira] [Commented] (IMPALA-12150) Use protocol version to isolate cluster components

    [ https://issues.apache.org/jira/browse/IMPALA-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735534#comment-17735534 ] 

ASF subversion and git services commented on IMPALA-12150:
----------------------------------------------------------

Commit a82830896b590dd7a445f8f7e91f25356a7dd3c2 in impala's branch refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a82830896 ]

IMPALA-12150: Use protocol version to isolate cluster components

Some Thrift request/response structs in CatalogService were changed to
add new variables in the middle, which caused cross version
incompatibility issue for CatalogService.

Impala cluster membership is managed by the statestore. During upgrade
scenarios where different versions of Impala daemons are upgraded one
at a time, the upgraded daemons have incompatible message formats.
Even through protocol versions numbers were already defined for
Statestore and Catalog Services, they were not used. The Statestore and
Catalog server did not check the protocol version in the requests, which
allowed incompatible Impala daemons to join one cluster. This causes
unexpected query failures during rolling upgrade.

We need a way to detect this and enforce that some rules are followed:
 - Statestore refuses the registration requests from incompatible
   subscribers.
 - Catalog server refuses the requests from incompatible clients.
 - Scheduler assigns tasks to a group of compatible executors.

This patch isolate Impala daemons into separate clusters based on
protocol versions of Statestore service to prevent incompatible Impala
daemons from communicating with each other. It covers the Thrift RPC
communications between catalogd and coordinators, and communication
between statestore and its subscribers (executor, coordinators,
catalogd and admissiond). This change should work for future upgrade.

Following changes were made:
 - Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for
   all requests of Statestore and Catalog services.
 - Update the request and response structs in CatalogService to ensure
   each Thrift request struct has protocol version and each Thrift
   response struct has returned status.
 - Update the request and response struct in StatestoreService to
   ensure each Thrift request struct has protocol version and each
   Thrift response struct has returned status.
 - Add subscriber type so that statestore could distinguish different
   types of subscribers.
 - Statestore checks protocol version for registration requests from
   subscribers. It refuses the requests with incompatible version.
 - Catalog server checks protocol version for Catalog service APIs, and
   returns error for requests with incompatible version.
 - Catalog daemon sends its address and the protocol version of Catalog
   service when it registers to statestore, statestore forwards the
   address and the protocol version of Catalog service to all
   subscribers during registration.
 - Add UpdateCatalogd API for StatestoreSubscriber service so that the
   coordinators could receive the address and the protocol version of
   Catalog service from statestore if the coordinators register to
   statestore before catalog daemon.
 - Add GetProtocolVersion API for Statestore service so that the
   subscribers can check the protocol version of statestore before
   calling RegisterSubscriber API.
 - Add starting flag tolerate_statestore_startup_delay. It is off by
   default. When it's enabled, the subscriber is able to tolerate
   the delay of the statestore's availability. The subscriber's
   process will not exit if it cannot register with the specified
   statestore on startup. But instead it enter into Recovery mode,
   it will loop, sleep and retry till it successfully register with
   the statestore. This flag should be enabled during rolling upgrade.

CatalogServiceVersion is defined in CatalogService.thrift. In future,
if we make non backward version compatible changes in the request or
response structures for CatalogService APIs, we need to bump the
protocol version of Catalog service.
StatestoreServiceVersion is defined in StatestoreService.thrift.
Similarly if we make non backward version compatible changes in the
request or response structures for StatestoreService APIs, we need
to bump the protocol version of Statestore service.

Message formats for KRPC communications between coordinators and
executors, and between admissiond and coordinators are defined
in proto files under common/protobuf. If we make non backward version
compatible changes in these structures, we need to bump the
protocol version of Statestore service.

Testing:
 - Added end-to-end unit tests.
 - Passed the core tests.
 - Ran manual test to verify old version of executors cannot register
   with new version of statestore, and new version of executors cannot
   register with old version of statestore.

Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc
Reviewed-on: http://gerrit.cloudera.org:8080/19959
Reviewed-by: Andrew Sherman <as...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Use protocol version to isolate cluster components
> --------------------------------------------------
>
>                 Key: IMPALA-12150
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12150
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> Some Thrift request/response structs in CatalogService were changed to add new variables in the middle of Thrift struct like CDPD-56557, which caused cross version incompatibility issue for CatalogService. 
> Even protocol versions were defined for Statestore and Catalog Services, but the Statestore and Catalog server don't check the protocol version
> in the requests now so that incompatible Impala daemons could join one cluster. This causes unexpected query failures during rolling upgrade.
> For rolling upgrade, we need to isolate Impala daemons into separate clusters based on protocol versions of Statestore service to avoid incompatible Impala daemons to communicate each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org