You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2019/07/03 14:27:52 UTC

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#11).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
28 files changed, 765 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>