You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2019/01/23 00:22:56 UTC

[Impala-ASF-CR] IMPALA-5872: Testcase builder for query planner

Hello Greg Rahn, Paul Rogers, Balazs Jeszenszky, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12221

to look at the new patch set (#2).

Change subject: IMPALA-5872: Testcase builder for query planner
......................................................................

IMPALA-5872: Testcase builder for query planner

Implements a new testcase builder for simulating query plans
from one cluster on a different cluster/minicluster with
different number of nodes. The testcase is collected from one
cluster and can be replayed on any other cluster. It includes
all the information that is needed to replay the query plan
exactly as in the source cluster.

Also adds a stand-alone tool (PlannerTestCaseLoader) that can
replay the testcase without having to start an actual cluster
or a dev minicluster. This is done to make testcase debugging
simpler.

Motivation:
----------
- Make query planner issues easily reproducible
- Improve user experience while collecting query diagnostics
- Make it easy to test new planner features by testing it on customer
  usecases collected from much larger clusters.

Commands:
--------
-- Collect testcase for a query stmt (outputs the testcase file path).
impala-shell> COPY TESTCASE TO <hdfs dirpath> <query stmt>

-- Load the testcase metadata in a target cluster (dumps the query stmt)
impala-shell> COPY TESTCASE FROM <hdfs testcase file path>
-- Replay the query plan
impala-shell> SET PLANNER_DEBUG_MODE=true
impala-shell> EXPLAIN <query stmt>

How it works?
------------

- During export on the source cluster, the command dumps all the thrift
  states of referenced objects in the query into a gzipped binary file.
- During replay on a target cluster, it adds these objects to the catalog
  cache by faking them as DDLs.
- The planner also fakes the number of hosts by using the scan range
  information from the target cluster.

Caveats:
------
- The tool does not collect actual data files for the tables. Only the
  metadata state is dumped.
- Currently only imports databases/tables/views. We can extend it to
  work for UDFS etc.
- It only works for QueryStmts (select/union queries)
- Once the metadata dump is loaded on a target cluster, the state is
  volatile. Hence it cannot survive a cluster restart / invalidate
  metadata
- Loading a testcase requires setting the query option (SET
  PLANNER_DEBUG_MODE=true) so that the planner knows to fake the number
  of hosts. Otherwise it takes into account the local cluster topology.
- Cross version compatibility of testcases needs some thought. For
  example, creating a testcase from Impala version 3.2 and trying to
  replay it on Impala version 3.5. This could be problematic if we don't
  keep the underlying thrift structures backward compatible.

Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
---
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogService.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/JniCatalog.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java
M fe/src/main/java/org/apache/impala/analysis/HdfsUri.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
A fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java
28 files changed, 598 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12221/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
Gerrit-Change-Number: 12221
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>