You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/20 07:02:00 UTC

[jira] [Created] (IMPALA-10875) Transient stale catalog if catalogd is restarted more than once shortly

Quanlong Huang created IMPALA-10875:
---------------------------------------

             Summary: Transient stale catalog if catalogd is restarted more than once shortly
                 Key: IMPALA-10875
                 URL: https://issues.apache.org/jira/browse/IMPALA-10875
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang


This is a follow-up task of IMPALA-5476. Though it's rare in practise, we still have a bug that client can see stale catalog in the following scenario:
 * Catalogd is restarted twice inside a statestore catalog update cycle.
 * A DDL finishes its execDdl RPC request on the second restarted catalogd. It gets a new catalog service id which differs from the local one. Then wait until the local one is updated.
 * Coordinator receives catalog update from the first restarted catalogd. So the local catalog service id changes, which wakes up the DDL execution thread.
 * The DDL execution thread finds the catalog service id still differs from the one that executes the DDL. Then ignores the DDL result and returns.

Client will see stale catalog until next catalog topic update comes.

The following test can reveal this bug (add it into tests/custom_cluster/test_restart_services.py)
{code:python}
  UPDATE_FREQUENCY_S = 10

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
    statestored_args="--statestore_update_frequency_ms={frequency_ms}"
    .format(frequency_ms=(UPDATE_FREQUENCY_S * 1000)))
  def test_restart_catalogd_twice2(self):
    self.execute_query_expect_success(self.client, "drop table if exists join_aa")
    self.execute_query_expect_success(self.client, "create table join_aa(id int)")
    # Make the catalog object version grow large enough
    self.execute_query_expect_success(self.client, "invalidate metadata")
    # No need to care whether the dll is executed successfully, it is just to make
    # the local catalog catche of impalad out of sync
    for i in range(0, 10):
      try:
        query = "alter table join_aa add columns (age" + str(i) + " int)"
        self.execute_query_async(query)
      except Exception, e:
        LOG.info(str(e))
    self.cluster.catalogd.restart()
    sleep(self.UPDATE_FREQUENCY_S * 2)
    self.cluster.catalogd.restart()
    self.execute_query_expect_success(self.client, "drop table join_aa")
    # Should not see stale metadata on 'join_aa'
    result = self.execute_query_expect_success(self.client, "show tables")
    assert 'join_aa' not in result.data
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org