You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2019/08/16 13:25:51 UTC
[Impala-ASF-CR] IMPALA-8836: Support COMPUTE STATS on insert only ACID tables
Hello Yongzhi Chen, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14066
to look at the new patch set (#7).
Change subject: IMPALA-8836: Support COMPUTE STATS on insert only ACID tables
......................................................................
IMPALA-8836: Support COMPUTE STATS on insert only ACID tables
For ACID tables COMPUTE STATS needs to use a new HMS API, as the
old one is rejected by metastore. This API currently has some
counter intuitive parts:
- setPartitionColumnStatistics is used to set table stats, as there
is no similar function exposed by HMS client for tables at the
moment.
- A new writeId is allocated for the stat change, and this needs
a transaction, so a transaction is opened/committed/aborted even
though this doesn't seem necessary. The Hive code seems to use
internal API for this.
- Even though the HMS thrift Table object has a colStats field,
it is only applied during alter_table if there are other changes
like new columns in the tables, so alter_table couldn't be used
to change column stats.
Additional changes:
- DROP STATS is no longer allowed for transactional tables, as it
turned out that there is no transactional version of the old API.
- Changed CatalogOpExecutor.updateCatalog() to get the writeIds
earlier. This can mean unnecassary HMS RPC calls if no property
change is needed in the end, but I felt it hard to reason about
what happens if these RPC calls fail at their original location.
TODOs (My pllan is to do these in IMPALA-8865):
- Tried to make the MetastoreShim API easier to use by adding a class
to encapsulate thing like txnId and writeId, but it feels rather
half baked and under documented.
A similar class is added in https://gerrit.cloudera.org/#/c/14071/,
it would be good to merge them.
- The validWriteIdList of the original SELECT(s) behind COMPUTE
STATS could be used in the HMS API calls, but this would need
more plumbing.
Change-Id: I5c06b4678c1ff75c5aa1586a78afea563e64057f
---
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
A testdata/workloads/functional-query/queries/QueryTest/acid-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M tests/query_test/test_acid.py
10 files changed, 405 insertions(+), 120 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/14066/7
--
To view, visit http://gerrit.cloudera.org:8080/14066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5c06b4678c1ff75c5aa1586a78afea563e64057f
Gerrit-Change-Number: 14066
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>