You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Huaisi Xu (Code Review)" <ge...@cloudera.org> on 2016/09/09 17:54:07 UTC

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Huaisi Xu has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4349

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................

IMPALA-1702: Enforce unique table ID

The problem is that loaded tables can be sent together
with INCOMPLETE tables after INVALIDATE METADATA. These
tables may share same table ID with some of the tables
previously because we reset() table ID counter when
INVALIDATE METADATA. This can be problematic when this
update goes to planner during analyzing: tables not yet
analyzed have the same table ID as those already analyzed.
This further causes problem in backend since we keep a map
mapping from table ID to table metadata, and tables with
same table ID overwrites each other in the map.

This commit enforces unique table ID for the lifetime of
catalog daemon by keeping a map recording already seen
table and reuse their ID whenever possible. This also
increases memory consumption upper bounded by the size of
all database name + table name.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/IncompleteTable.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/catalog/TableId.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/com/cloudera/impala/common/Id.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
8 files changed, 52 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#14).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
26 files changed, 193 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/9//COMMIT_MSG
Commit Message:

PS9, Line 7: Enforce table level consistency accross service
IMO, this is a little misleading as its sounds similar to consistency guarantees in DBs. Instead we should have something like "Make table-id assignment query local" or something similar. Thoughts?


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 295:     private final HashMap<TableName, Table> referencedTables_ = Maps.newHashMap();
Is this change (maintaining referenced tables cache) still required with the latest patch design with per-query id assignment? Given Catalog doesn't assign an Id anymore, why do we need to have the same reference of the table? May be I'm missing something.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

PS9, Line 187:  To ensure the table ID is unique
             :       // in the same query, assign a special CTAS ID. The CatalogServer will assign this
             :       // table a proper ID once it is created there as part of the CTAS execution.
update this as per new design.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS9, Line 154: /**
             :    * Connect tupleDescriptors to tableDescriptors with unique table ids.
             :    * Verify table level consistency in the same query by checking references to the same
             :    * table refer to the same table instance.
Nit: This should probably be moved to the line before the logic given this is not what this method does.


Line 164:     HashMap<TableName, Table> referencedTables = Maps.newHashMap();
Same question as in referencedTables_ in Analyzer. Assuming we want to retain referencedTables_ there, why do we need to repeat it here. Can't we reuse it?


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 37: import org.apache.impala.analysis.TableName;
Move it below to org.apache.impala group.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 295:     private final HashMap<TableName, Table> referencedTables_ = Maps.newHashMap();
> Is this change (maintaining referenced tables cache) still required with th
I agree it's not required specifically to fix the table id problem, but it generally seems saner to have the guarantee. I think we can probably come up with scenarios where not having this guarantee can cause hard-to-debug problems (which are different than the table id problems).


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 164:     HashMap<TableName, Table> referencedTables = Maps.newHashMap();
> Same question as in referencedTables_ in Analyzer. Assuming we want to reta
The map is only for validation purposes.

The purpose of this map is to check that we've gotten tables from Analyzer.referencedTables_ in all code paths, so it doesn't make sense to validate with Analyzer.referencedTables_ itself.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   public TDescriptorTable toThrift() {
> Let's add a comment about table id assignment and that this also checks the
Done


Line 156:     // Maps from base table to its table id used in backend.
> in the backend
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 42: import com.google.common.base.Preconditions;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1128:    * 1. All tables sent to backend have a unique Id, assigned in
> Let's rework/move this validation as follows:
Done


Line 1134:   private void validateTableIds(Analyzer analyzer, TExecRequest result)
> validateTableConsistency()?
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 93:   private final Map<Long, TTableDescriptor> tableMap_ = Maps.newHashMap();
> Integer
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   public TDescriptorTable toThrift() {
Let's add a comment about table id assignment and that this also checks the single-table consistency guarantee.


Line 156:     // Maps from base table to its table id used in backend.
in the backend


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 42: import com.google.common.base.Preconditions;
remove


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1128:    * 1. All tables sent to backend have a unique Id, assigned in
Let's rework/move this validation as follows:

* Let's check the table-level consistency guarantee directly in DescriptorTbl.toThrift().

* Let's add validateTableIds() to PlannerTestBase that checks the uniqueness of table ids in the produced TDescriptorTable

* Get rid of this current function and remove references to IMPALA-1702.


Line 1134:   private void validateTableIds(Analyzer analyzer, TExecRequest result)
validateTableConsistency()?


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 93:   private final Map<Long, TTableDescriptor> tableMap_ = Maps.newHashMap();
Integer


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 17: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#11).

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................

IMPALA-1702: Enforce table level consistency

Catalogd managed table id generation causes problems when
new update arrives at frontend while queries are being
analyzing: reference to the same table may refer to a
different version, and different table may share same
table id. This causes problems when one table override
another in the backend table map.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it
is sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 168 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // Catalog cache has the latest tables from the statestore, in order to use the
> The Impalad Catalog has ...
Done


Line 295:     // same table version in a single query, we cache all referenced tables and reuse them
> ... query , we cache all referenced tables here.
Done


Line 2299:     if (table == null) {
> remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 187:       // SelectStmt (or the BE will be very confused). Assign a unique table ID to it.
> The previous longer comment was clearer. How about:
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // List of tables to exclude from partition pruning, and thus all partitions are sent to
> Afaik, these tables have nothing to do with partition pruning, so the name 
The only place that this actually becomes useful is for partition pruning? so for "INSERT INTO a PARTITION(b='d') select c from a;
if c is in another partition other than partition 'b', then without this partition b's metadata wont be sent. it has no other meaningful usage than this I think. maybe I just change this to a HashSet?


Line 181:       referencedPartitions = getReferencedPartitions(tbl);
> will remove this
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:    * Ever increasing counter for table id {@link Table#id_}. New id is given when
> TableId doesn't exist anymore.
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64:   private static final Object metastoreAccessLock_ = new Object();
> Please remove while you are here.
remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wanted.


Line 72:   public static final long LOCAL_TABLE_ID = -4;
> LOCAL_VIEW_ID is more explicit. Sorry to make you change it again.
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/planner/JoinTableId.java
File fe/src/main/java/org/apache/impala/planner/JoinTableId.java:

Line 25:   // Construction only allowed via an IdGenerator.
> newline before comment
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1126:    * Check that we don't have any duplicate table IDs (see IMPALA-1702).
> We should declare IMPALA-1702 as fixed and not mention it here. Instead, we
I added something in ImpaladCatalog.java::addTable(). not sure if that is what you suggested.k


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4349/12//COMMIT_MSG
Commit Message:

Line 7: IMPALA-1702: Enforce table level consistency
> Enforce single-table consistency in query analysis.
Done


Line 10: new update arrives at frontend while queries are being
> new updates arrive at the frontend
Done


Line 12: to a different version, and different table may share the
> different tables
Done


Line 14: another in the backend table map.
> another one in the thrift descriptor table sent to the backend.
Done


Line 16: This commit removes table id from the Table object; instead
> removes the table if from the catalog Table object
Done


Line 17: frontend assigns a unique id to each table just before
> the frontend assigns a unique id to each table in DescriptorTable.toThrift(
Done


http://gerrit.cloudera.org:8080/#/c/4349/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 417:    * Validates that all tables in the descriptor table of 'request' have a unique id.
> Validates that all tables in the descriptor table of 'request' have a uniqu
Done


Line 422:     if (execRequest.isSetDesc_tbl()) {
> I will merge that to line 420 next patch..
Done


Line 424:       if (descTbl.isSetTableDescriptors()) {
> and merge this
Done


Line 434:         for (TTupleDescriptor tupleDesc: descTbl.tupleDescriptors) {
> also check the table sink's id (if there is a sink)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#17).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 204 insertions(+), 219 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#5).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit solves different table having same table ID
issue by changing table ID counter to an ever increasing
64 bits integer. It solves same table with different table
issue ID by keeping a referecedTables_ cache in
Analyzer::globalState_ so that calling Analyzer::getTable()
on the same table returns the same references for the same
query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
26 files changed, 118 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

PS6, Line 1105: /**
              :    * Our single query table consistency guarantee:
              :    * 1. All referenced tables have a unique id
              :    * 2. All references of the same table refer to the same version(indicated by the table
              :    * id).
              :    * Validate 2 by checking same tables have the same table instance.
              :    */
will update


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> Alex, Bharath, another possibility is that given our asymmetry architecture
Isn't that ok? Once you drop+create a table with same name, it isn't essentially the "same" table anymore, for ex: its schema could've changed. IMO thats fine. I think this is what Alex meant when he mentioned that drop+create should create a new table ID.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> yeah, but ideally any table change should trigger a new analyzing just like
You raise a good point. Maybe we should consider how to ensure at least table-level consistency during analysis. For example, one way could be to add referenced Tables into the Analyzer such that subsequent references will see the same Table object (instead of going to the Catalog every time).

Seems relatively easy to do.

Dimitris also has a good point that distinguishing the drop+add case from the invalidate case may be hard, but I think the proposed Analyzer change above should handle that, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15:

I'd prefer to not address that question in this patch because we'll just end up with another Catalog subclass and an instance of that inside the Analyzer's global state. To me that just makes things even more complicated.

One option is to leave out the Table cache altogether. The important change here is to remove the global table ids.

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/10/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 155:    * Connect tupleDescriptors to tableDescriptors with unique table ids and get
> Returns the thrift representation of this DescriptorTable. Assigns unique i
Done


http://gerrit.cloudera.org:8080/#/c/4349/10/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 418:    * Our single query table consistency guarantee:
> Way too elaborate this comment. just say:
Done


Line 422:    * 2. All tables reference to the same version of table in Analyzer::getTable().
> Remove this block
Done


Line 426:   private void validateTableConsistency(TExecRequest request) {
> validateTableIds
Done


Line 433:         for (TTableDescriptor tableDesc: descTbl.tableDescriptors) {
> This code seems pretty elaborate for checking unique ids. Wouldn't it be su
Done. but that way we do not know the old table's name.


Line 437:             throw new IllegalStateException("Failed to verify table consistency for" +
> We're not validating table consistency. Error report should be consistent w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#8).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it
is sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
26 files changed, 178 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#3).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this. If backend sees these,
it will overwrites each other in an 1:1 table ID to table
descriptor map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
other activities, so Analyzer::getTable() can return
different references for the same table when analyzing
the same query.

This commit solves different table having same table ID by
changing table ID counter to an ever increasing 64 bits
integer. It solves same table with different table ID by
keeping a referecedTables_ cache in Analyzer::globalState_
so that calling Analyzer::getTable() on the same table
returns the same references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
24 files changed, 82 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // Catalog cache has the latest tables from the statestore, in order to use the
The Impalad Catalog has ...

Sentence is pretty long, use a few "."


Line 295:     // same table version in a single query, we cache all referenced tables and reuse them
... query , we cache all referenced tables here.


Line 2299:     if (table == null) {
remove


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 187:       // SelectStmt (or the BE will be very confused). Assign a unique table ID to it.
The previous longer comment was clearer. How about:

To ensure the table ID us unique within this query assign a special CTAS ID.


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // List of tables to exclude from partition pruning, and thus all partitions are sent to
Afaik, these tables have nothing to do with partition pruning, so the name is misleading.

We need to add register these tables so that the BE can see the table metadata in the DesciptorTable.

How about tblsWithoutTupleDesc?


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:    * Ever increasing counter for table id {@link Table#id_}. New id is given when
TableId doesn't exist anymore.


Ever increasing counter for table ids. A table is given a new id when loading its metadata for the first time or when doing a refresh/invalidate.


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64:   private static final Object metastoreAccessLock_ = new Object();
Please remove while you are here.


Line 72:   public static final long LOCAL_TABLE_ID = -4;
LOCAL_VIEW_ID is more explicit. Sorry to make you change it again.


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/planner/JoinTableId.java
File fe/src/main/java/org/apache/impala/planner/JoinTableId.java:

Line 25:   // Construction only allowed via an IdGenerator.
newline before comment


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1126:    * Check that we don't have any duplicate table IDs (see IMPALA-1702).
We should declare IMPALA-1702 as fixed and not mention it here. Instead, we should describe our expectations:

1. All referenced tables have a unique id
2. All references of the same table refer to the same version (indicated by the table id)


I'd still like to see some validation when receiving updates from the Statestore. I mentioned that in the last round and I think it's important we add that to make sure the system behaves as expected.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#16).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 199 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#10).

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................

IMPALA-1702: Enforce table level consistency

Catalogd managed table id generation causes problems when
new update arrives at frontend while queries are being
analyzing: reference to the same table may refer to a
different version, and different table may share same
table id. This causes problems when one table override
another in the backend table map.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it
is sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 176 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 11:

(8 comments)

thanks!

http://gerrit.cloudera.org:8080/#/c/4349/11//COMMIT_MSG
Commit Message:

Line 11: analyzing: reference to the same table may refer to a
> analyzed
Done


Line 12: different version, and different table may share same
> the same
Done


Line 16: This commit removes table Id from Table object; instead
> the Table
Done


PS11, Line 17: it
> execRequest
Done


PS11, Line 21: references
> reference
Done


http://gerrit.cloudera.org:8080/#/c/4349/11/fe/src/main/java/org/apache/impala/catalog/View.java
File fe/src/main/java/org/apache/impala/catalog/View.java:

PS11, Line 92: testig
> not your change, but please fix the typo.
Done.


http://gerrit.cloudera.org:8080/#/c/4349/11/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 426:         HashSet<Integer> seenTableId = Sets.newHashSet();
> seenTableIds
Done


Line 435:       }
> Sorry to keep adding to this, but let's also loop through all tuple descrip
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> Isn't that ok? Once you drop+create a table with same name, it isn't essent
I think that behavior is actually desired. If a table with the same name was dropped+added it *should* have a different table id.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> this is a good idea! I will try to do this. but for 64bit thing I think it 
Alex, Bharath, another possibility is that given our asymmetry architecture, one impalad can drop and create a new table with the same name. if this table gets updated in a single catalog update to another impalad, then we might see same table with different table ID as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15:

> (3 comments)

Should I do the same for other catalog object as well? e.g. data source?

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

If we use a long (64-bits) do we even need to track which IDs are in use?

I calculate that if we allocate 1 million IDs a second it would take almost 300,000 years to run out of 64-bit integers. In python:

  >>> (2 ** 63) / 1000000 / 60 / 60 / 24 / 365
  292471L

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15:

The concern that Marcel raised is also related to https://issues.cloudera.org/browse/IMPALA-3568 and I agree with Alex that this JIRA may not be the best vehicle to address it. I am ok with keeping the Table Cache and documenting the expected behavior.

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/14/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS14, Line 44: Table targetTable_
will update comment. basically this is used to reserve id 0 to table sink because we have a separate logic in backend handling table sinks. this is also used in partition pruning.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/9//COMMIT_MSG
Commit Message:

PS9, Line 7: Enforce table level consistency
> IMO, this is a little misleading as its sounds similar to consistency guara
I think this is just a high level description. It hides implementation details. This is the goal we are trying to achieve. and it contains two things.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 295:     private final HashMap<TableName, Table> referencedTables_ = Maps.newHashMap();
> Thanks for the explanation Alex. Makes sense to me.
Although this is not strictly necessary, this is more of completing the circle.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

PS9, Line 187: 
             :       table.load(true, client.getHiveClient(), msTbl);
             :       insertStmt_.setTargetTable(table);
> update this as per new design.
Done


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS9, Line 154: /**
             :    * Connect tupleDescriptors to tableDescriptors with unique table ids and get
             :    * this DescriptorTable ready to be sent to backend.
             :    */
> How about we prefix this comment with a brief description of what this meth
Done


Line 164: 
> Sorry, I see what you mean now. Yes, that would work as well, but we'd need
This is only for table validation so same table references have the same instance. If this is for validation purposes then it makes more sense to isolate this from anything we used in the code.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 37: import org.apache.kudu.client.KuduClient;
> Move it below to org.apache.impala group.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3//COMMIT_MSG
Commit Message:

PS3, Line 12: If backend sees these,
            : it will overwrites each other in an 1:1 table ID to table
            : descriptor map.
might need a little rephrase.


PS3, Line 18: activities
operations?


PS3, Line 22:  
..issue.. (same in next statement too.)


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS3, Line 296: HashMap
I think its better to use the parent Map<> class here (that seems to be consistent usage elsewhere in the code).


PS3, Line 2282: Returns the Catalog Table object for the given database and table name.
              :    * Adds the table to this analyzer's "missingTbls_" and throws an AnalysisException if
              :    * the table has not yet been loaded in the local catalog cache.
              :    * Throws an AnalysisException if the table or the db does not exist in the Catalog.
              :    * This function does not register authorization requests and does not log access events
Please update this to reflect the new logic.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS3, Line 45: targetTable_
Just curious what the rationale behind this clean up. Is it because we use it only for the inserts? I don't know if we intend to use a bunch of referencedTables_ in the future for any other usecase in which case this change should again be undone. Alex, any idea?


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicInteger;
Unused import now that we switched to Long?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> +1. Given this happens only with "invalidate metadata", how about we change
This seems like the simplest solution to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 13: Code-Review+1

(1 comment)

Nice work! Please address the remaining nits.

Henry or Marcel, can you sign off on the Catalog changes? I'm +2 for the FE changes.

http://gerrit.cloudera.org:8080/#/c/4349/13/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 452:         if (seenTableIds.isEmpty()
empty check not needed


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // The output table of an insert query.
> I still don't understand what this has to do with partition pruning. The pu
Done


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicLong;
remove next patch


Line 141:   private final TableLoadingMgr tableLoadingMgr_;
> Sorry I got confused, you can remove this sentence. I incorrectly thought t
done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64: 
> The metastoreAccessLock_ is unused. Please remove.
Done


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 79:   protected final ArrayList<Column> colsByPos_ = Lists.newArrayList();
> As discussed, thinking about this a little more I think the right fix is to
kind of done. what do you think


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3//COMMIT_MSG
Commit Message:

PS3, Line 12: all. Backend could
            : overwrites each table in an 1:1 table ID to table descriptor
            : map.
> might need a little rephrase.
what about this?


PS3, Line 18: analyzing.
> operations?
Done


PS3, Line 22: 
> ..issue.. (same in next statement too.)
Done


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // Catalog cache has the latest tables from the statestore, in order to use the
> "Caches all tables referenced in a single query"
Done


PS3, Line 296: same qu
> I think its better to use the parent Map<> class here (that seems to be con
I saw a lot of HashMap in the code as well?


PS3, Line 2282: 
              :    * Returns the Catalog Table object for the given database and table name. The same
              :    * table reference is returned for multiple references in the same query.
              :    * Adds the table to this analyzer's "missingTbls_" and throws an AnalysisException if
              :    * the table has not yet been loaded in the local catalog cache.
> Please update this to reflect the new logic.
Done


Line 2290:   public Table getTable(String dbName, String tableName)
> I'm not sure if this will deal with case-insensitivity properly. Why not us
Done


Line 2291:       throws AnalysisException, TableLoadingException {
> How about this:
Done


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS3, Line 45: mple, the ou
> Just curious what the rationale behind this clean up. Is it because we use 
The reason for this is that previously I tried to add the cache here and wanted to use the name referencedTable_, which was already used.. So I have to give it another name, which I failed to think of.. I ended up just replace this with something else, and I am not sure if I should change it back.. the name is quite confusing for now because we only uses it for insert table. we definitely should change its name to be something sounds like more specific for partition pruning.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicLong;
> Unused import now that we switched to Long?
done.. there are many unused import.. I will leave the rest there?


Line 141:    * Ever increasing counter for table id {@link Table#id_}. New id is given when
> Add comment that explains the use and behavior of this id, i.e. the id is e
Done


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 65:   // IDs not covered in all other cases.
> Add comment for these subtly different special ids.
Done


Line 67:   // ID used for IncompleteTable when it is uninitialized.
> I think we should add LOCAL_TABLE_ID and TEST_TABLE_ID as well and describe
Done


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/common/Id.java
File fe/src/main/java/org/apache/impala/common/Id.java:

Line 42: 
> revert
Done


Line 47: 
> revert
Done


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1130
> I think we should still maintain something like this to check and document 
Done. the new guarantee in Impala frontend is that same query uses the same table reference for the same table. Different tables with same ID is guaranteed in catalog.


Line 1134
> There is similar code in a few places in the BE. We should remove that, too
as discussed offline. wont do


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#12).

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................

IMPALA-1702: Enforce table level consistency

Catalogd managed table id generation causes problems when
new update arrives at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different table may share the
same table id. This causes problems when one table overrides
another in the backend table map.

This commit removes table id from the Table object; instead
frontend assigns a unique id to each table just before
execRequest is sent to backend. It also implements a
referecedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 174 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> I think that behavior is actually desired. If a table with the same name wa
I think it may be hard to differentiate between a table dropped + added and a table reloaded after an invalidate metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4349/11//COMMIT_MSG
Commit Message:

Line 11: analyzing: reference to the same table may refer to a
analyzed


Line 12: different version, and different table may share same
the same


Line 16: This commit removes table Id from Table object; instead
the Table


PS11, Line 17: it
execRequest


PS11, Line 21: references
reference


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#2).

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................

IMPALA-1702: Enforce unique table ID

The problem is that loaded tables can be sent together
with INCOMPLETE tables after INVALIDATE METADATA. These
tables may share same table ID with some of the tables
previously because we reset() table ID counter when
INVALIDATE METADATA. This can be problematic when this
update goes to planner during analyzing: tables not yet
analyzed have the same table ID as those already analyzed.
This further causes problem in backend since we keep a map
mapping from table ID to table metadata, and tables with
same table ID overwrites each other in the map.

This commit enforces unique table ID for the lifetime of
catalog daemon by keeping a map recording already seen
table and reuse their ID whenever possible. This also
increases memory consumption upper bounded by the size of
all database name + table name.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/IncompleteTable.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/catalog/TableId.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/com/cloudera/impala/common/Id.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
8 files changed, 53 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 22: typedef i64 TTableId
now i32 is ok I think.


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 70: import org.apache.impala.thrift.TTableName;
will remove


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicLong;
will remove


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 38: import org.apache.impala.analysis.*;
not intentional.. will revert next patch!


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 14:

(4 comments)

thanks

http://gerrit.cloudera.org:8080/#/c/4349/14/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 366
> there's a TTableDescriptor.table_id which deserves a comment (clarify that 
Done


http://gerrit.cloudera.org:8080/#/c/4349/14/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // same version of table in a single query, we cache all referenced tables here.
> "of a table"
Done.

Yes. but I think we definitely should not cache Db at this time since we maybe expecting a "complete" table arriving. For others, I am not sure about the others. maybe we should consider those in another Jira?


Line 2284:    * the table has not yet been loaded in the local catalog cache.
> update comment: effect on referencedTables_
Done


http://gerrit.cloudera.org:8080/#/c/4349/14/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS14, Line 44: Table targetTable_
> will update comment. basically this is used to reserve id 0 to table sink b
Done, but may need reword?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 181:       referencedPartitions = getReferencedPartitions(tbl);
will remove this


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 17: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#6).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id, and keeps a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
28 files changed, 151 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#16).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 199 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15:

> The concern that Marcel raised is also related to https://issues.cloudera.org/browse/IMPALA-3568
 > and I agree with Alex that this JIRA may not be the best vehicle to
 > address it. I am ok with keeping the Table Cache and documenting
 > the expected behavior.

thanks

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 12:

(8 comments)

Almost there

http://gerrit.cloudera.org:8080/#/c/4349/12//COMMIT_MSG
Commit Message:

Line 7: IMPALA-1702: Enforce table level consistency
Enforce single-table consistency in query analysis.


Line 10: new update arrives at frontend while queries are being
new updates arrive at the frontend


Line 12: to a different version, and different table may share the
different tables


Line 14: another in the backend table map.
another one in the thrift descriptor table sent to the backend.


Line 16: This commit removes table id from the Table object; instead
removes the table if from the catalog Table object


Line 17: frontend assigns a unique id to each table just before
the frontend assigns a unique id to each table in DescriptorTable.toThrift().


http://gerrit.cloudera.org:8080/#/c/4349/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 417:    * Validates that all tables in the descriptor table of 'request' have a unique id.
Validates that all tables in the descriptor table of 'request' have a unique id and those are properly referenced by other thrift structures (tuple descriptors, table sink).


Line 434:         for (TTupleDescriptor tupleDesc: descTbl.tupleDescriptors) {
also check the table sink's id (if there is a sink)


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15: Code-Review+2

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // same version of a table in a single query, we cache all referenced tables here.
add todo that we need to investigate whether to do this for other catalog objects as well


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 43:   // The target table of a table sink, at most one for a single query.
"may be null" is a better phrasing.


Line 44:   // Table id 0 is reserved for it. Set in analyze() query statement that produces a
"Set in QueryStmt.analyze()."


Line 45:   // table sink, e.g. InsertStmt.analyze(), ModifyStmt.analyze(). Also, partition pruning
remove last sentence, that goes without saying.


Line 179:         // Verify table level consistency in the same query by checking references to the
"checking that"


Line 180:         // same table refer to the same table instance.
"same Table instance" is even more specific (because we're talking about objects from that class).


Line 202:       if (tbl != targetTable_) {
single line


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 201:     if (tableId != null) {
single line


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 438:       if (descTbl.isSetTupleDescriptors()) {
precede w/ blank line


Line 447:     if (execRequest.isSetFragments() && !execRequest.fragments.isEmpty()) {
same here


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/1/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 191:     public TableId getUniqueId(TTableName tblName) {
have race here, will fix..


http://gerrit.cloudera.org:8080/#/c/4349/1/fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
File fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java:

Line 57:   public Table load(Db db, TTableName tblName) throws DatabaseNotFoundException {
will remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   // Connect tupleDescriptors to tableDescriptors with unique table ids.
Use /** */ style comment


Line 156:   // table are the same instances.
are the same instance -> refer to the same table instance.


Line 162:     HashMap<TableName, Table> referencedTables = Maps.newHashMap();
also add targetTable_ to this map if needed


Line 173:         Table oldTable = referencedTables.get(tblName);
"old" seems a little odd here, maybe something like checkTable?


http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/main/java/org/apache/impala/analysis/TableName.java
File fe/src/main/java/org/apache/impala/analysis/TableName.java:

Line 49:   public TableName(Table table) {
Table.getTableName() does the same thing


http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 421:    * 2. All tables reference to the same version of table in Analyzer::getTable().
DescriptorTable.toThrift() already does this, so we don't need to check it again here.

We only need to validate that table ids are unique here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS9, Line 154: /**
             :    * Connect tupleDescriptors to tableDescriptors with unique table ids.
             :    * Verify table level consistency in the same query by checking references to the same
             :    * table refer to the same table instance.
> Nit: This should probably be moved to the line before the logic given this 
How about we prefix this comment with a brief description of what this method does? I'd prefer to keep the validation comment in the class comment, there seems to be no good place to put it inside the method, and callers might be interested in knowing about the behavior without reading the whole method.


Line 164:     HashMap<TableName, Table> referencedTables = Maps.newHashMap();
> The map is only for validation purposes.
Sorry, I see what you mean now. Yes, that would work as well, but we'd need to pass the Analyzer in here which does not seem to make sense. An alternative approach is to move Analyzer.referencedTables_ into the DescriptorTable. 

Either the current or the latter solutions work for me. We don't save/lose much either way.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 14: Code-Review+1

(2 comments)

carry Alex +1

http://gerrit.cloudera.org:8080/#/c/4349/14/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

PS14, Line 429:  
extra space.... will remove in next patch


PS14, Line 439:  
and here


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> You raise a good point. Maybe we should consider how to ensure at least tab
I agree with Alex. The best solution is to ensure a query is working on a snapshot of the metadata and avoid going to the catalog multiple times especially for the same table.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4349/14/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 366
there's a TTableDescriptor.table_id which deserves a comment (clarify that this is a query-local id)


http://gerrit.cloudera.org:8080/#/c/4349/14/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // same version of table in a single query, we cache all referenced tables here.
"of a table"

shouldn't this also apply to other top-level objects from the catalog? there's analyzer.getDb() and there are numerous references to analyzer.getCatalog() elsewhere (data source, auth policy, hdfs cache pool, ...)


Line 2284:    * the table has not yet been loaded in the local catalog cache.
update comment: effect on referencedTables_


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 422:     if (execRequest.isSetDesc_tbl()) {
I will merge that to line 420 next patch..


Line 424:       if (descTbl.isSetTableDescriptors()) {
and merge this


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // Caches all table used in a single query. Use the same reference instead of getting
"Caches all tables referenced in a single query"

The comment should explain why we have this local cache. The main motivation is to guarantee single table consistency within a query, i.e., that within one query the same version of a table is used.

Note that the catalog cache always contains the latest version of a table.


Line 2290:     TTableName tblName = new TTableName(dbName, tableName);
I'm not sure if this will deal with case-insensitivity properly. Why not use a TableName instead? The equals() and hashCode() definitely do the right thing.


Line 2291:     Table table = globalState_.referencedTables_.get(tblName);
How about this:
if (table != null) {
  // Return query-local version of table.
  Preconditions.checkState(table.isLoaded());
  return table;
}


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:   protected final AtomicLong nextTableId_ = new AtomicLong(0);
Add comment that explains the use and behavior of this id, i.e. the id is ever-increasing and only gets reset when the service is re-started. Mention at which points a table gets assigned a new id.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 65:   public static final long INVALID_TABLE_ID = -1;
Add comment for these subtly different special ids.


Line 67:   public static final long ERROR_METADATA_LOADING_ID = -3;
I think we should add LOCAL_TABLE_ID and TEST_TABLE_ID as well and describe their uses.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/common/Id.java
File fe/src/main/java/org/apache/impala/common/Id.java:

Line 42:     return Long.valueOf(id_).hashCode();
revert


Line 47:     return Long.toString(id_);
revert


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1130
I think we should still maintain something like this to check and document our new guarantees.

In addition, it would be good to check our invariants in other places if possible. For example, when we receive an update for a table from the statestore, we should be able to guarantee that the currentTableId <= newTableId iff currentCatalogVersion < catalogVersion. Or something along those lines, I think you get the idea :)


Line 1134
There is similar code in a few places in the BE. We should remove that, too.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> I think it may be hard to differentiate between a table dropped + added and
why we want to differentiate this? we do not do this now as well right?


Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> I think that behavior is actually desired. If a table with the same name wa
yeah, but ideally any table change should trigger a new analyzing just like incomplete table would do in the middle of analyze. so we do not have the case when for 100 sub queries, half of them are analyzed based on table version a, the second half on version b. and backend ends up with two version of the same table as well. If we keep them the same table ID for the same table name, then at least backend can do things on a single version of table? I know that may not matter.. anyway, I will do what you guys suggested... thanks


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> Why do we need another map? Don't we already have a map of existing tables 
+1. Given this happens only with "invalidate metadata", how about we change just the reset() to re-use existing Ids (by looking up from the existing dbCache, rather than resetting to 0) and other places just uses a new unique id. This solves two issues.

1. Invalidate won't change the Id.
2. drop+create of same table creates a new Id.

Going by Tim's stats, if we use a 64-bit, we can use an ever-increasing Id. Thoughts?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 14:

(3 comments)

thanks

http://gerrit.cloudera.org:8080/#/c/4349/13/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

PS13, Line 425: HashSet<Integer> seenTableIds = Sets.newHashSet();
              : 
> will clean this.
Done


Line 439:         for (TTupleDescriptor tupleDesc : descTbl.tupleDescriptors) {
> if (execRequest.isSetTupleDescriptors()) {
Done


Line 452:         if (!seenTableIds.contains(tableSink.target_table_id)
> empty check not needed
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#9).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it
is sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 178 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#15).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 198 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 15:

Marcel, any more comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

> If we use a long (64-bits) do we even need to track which IDs are
 > in use?
 > 
 > I calculate that if we allocate 1 million IDs a second it would
 > take almost 300,000 years to run out of 64-bit integers. In python:
 > 
 > >>> (2 ** 63) / 1000000 / 60 / 60 / 24 / 365
 > 292471L

do you think same table with different ID is ok? that is possible when just increasing table ID

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#4).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit solves different table having same table ID
issue by changing table ID counter to an ever increasing
64 bits integer. It solves same table with different table
issue ID by keeping a referecedTables_ cache in
Analyzer::globalState_ so that calling Analyzer::getTable()
on the same table returns the same references for the same
query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 121 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#7).

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it is
sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 155 insertions(+), 188 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 11: Code-Review+1

(1 comment)

The change looks fine to me. Thanks for working on this Huaisi.

http://gerrit.cloudera.org:8080/#/c/4349/11/fe/src/main/java/org/apache/impala/catalog/View.java
File fe/src/main/java/org/apache/impala/catalog/View.java:

PS11, Line 92: testig
not your change, but please fix the typo.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 16:

(10 comments)

Thanks Marcel.

http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294:     // The Impalad Catalog has the latest tables from the statestore. In order to use the
> add todo that we need to investigate whether to do this for other catalog o
Done


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 43:  * them unique ids.
> "may be null" is a better phrasing.
Done


Line 44:  */
> "Set in QueryStmt.analyze()."
Done


Line 45: public class DescriptorTable {
> remove last sentence, that goes without saying.
Done


Line 179:       // inline view of a non-constant select has a non-materialized tuple descriptor
> "checking that"
Done


Line 180:       // in the descriptor table just for type checking, which we need to skip
> "same Table instance" is even more specific (because we're talking about ob
Done


Line 202:       for (SlotDescriptor slotD: tupleDesc.getMaterializedSlots()) {
> single line
Done


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 201:   public void materializeSlots() {
> single line
done. but not sure if this is what you meant.


http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 438:       if (descTbl.isSetTupleDescriptors()) {
> precede w/ blank line
Done


Line 447:     if (execRequest.isSetFragments() && !execRequest.fragments.isEmpty()) {
> same here
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has submitted this change and it was merged.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Reviewed-on: http://gerrit.cloudera.org:8080/4349
Tested-by: Internal Jenkins
Reviewed-by: Alex Behm <al...@cloudera.com>
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 204 insertions(+), 219 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 295:     private final HashMap<TableName, Table> referencedTables_ = Maps.newHashMap();
> I agree it's not required specifically to fix the table id problem, but it 
Thanks for the explanation Alex. Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/4349/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 164:     HashMap<TableName, Table> referencedTables = Maps.newHashMap();
> The map is only for validation purposes.
Discussed with Alex offline. Its easier this way. wfm.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // Target table in an insert statement.
Is this also the target for UPDATE/DELETE statements?


PS3, Line 45: targetTable_
> The reason for this is that previously I tried to add the cache here and wa
I'm in favor of this change because "referencedTables" was so generic that it was misleading. We should prefer to be as specific as possible. If we need more abstraction, we can add it later if we need it (but chances are we won't).


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   /**
> Use /** */ style comment
Done


Line 156:    * Verify table level consistency in the same query by checking references to the same
> are the same instance -> refer to the same table instance.
Done


Line 162:     HashMap<Table, Integer> tableIdMap = Maps.newHashMap();
> also add targetTable_ to this map if needed
Done


Line 173:       if (!tupleDesc.isMaterialized()) continue;
> "old" seems a little odd here, maybe something like checkTable?
Done


http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/main/java/org/apache/impala/analysis/TableName.java
File fe/src/main/java/org/apache/impala/analysis/TableName.java:

Line 49:   public String getTbl() { return tbl_; }
> Table.getTableName() does the same thing
Done


http://gerrit.cloudera.org:8080/#/c/4349/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 421:    *
> DescriptorTable.toThrift() already does this, so we don't need to check it 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // List of tables to exclude from partition pruning, and thus all partitions are sent to
> The only place that this actually becomes useful is for partition pruning? 
I still don't understand what this has to do with partition pruning. The purpose of this list is correctly described in the original comment. It is to ship the descriptors of these tables to the BE. 

If we don't do this, then the BE will not know where to write data to in an insert because the base and partition paths are part of the table metadata. The purpose really is to ship the table metadata to the BE and I don't see relevance to partition pruning.

Imo, this should be like you had it before. It should be called targetTable_ or insertTable_ or insertTargetTable_ or something like that.


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:    * TableId does not exist anymore.
Sorry I got confused, you can remove this sentence. I incorrectly thought the @link below referenced the TableId class.


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64:   private static final Object metastoreAccessLock_ = new Object();
> remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wante
The metastoreAccessLock_ is unused. Please remove.


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 79:   protected final long id_;
As discussed, thinking about this a little more I think the right fix is to get rid of this id altogether.

The table id is only used for mapping tuple descriptors to tables in the BE - a query local mapping.

So I think the better fix is to remove the table id here and instead assign ids on-the-fly (or establish the mapping some other way) in DescriptorTbl.toThrift().


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 11:

(2 comments)

Will have a final pass over the commit msg in the next patch set, then I'm ready to +1

http://gerrit.cloudera.org:8080/#/c/4349/11/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 426:         HashSet<Integer> seenTableId = Sets.newHashSet();
seenTableIds


Line 435:       }
Sorry to keep adding to this, but let's also loop through all tuple descriptors and check that the table id (if any) is in seenTableIds.

Also do that for the table sink if any is set.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/10/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 155:    * Connect tupleDescriptors to tableDescriptors with unique table ids and get
Returns the thrift representation of this DescriptorTable. Assigns unique ids to all distinct tables and sets them in tuple descriptors as necessary.


http://gerrit.cloudera.org:8080/#/c/4349/10/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 418:    * Our single query table consistency guarantee:
Way too elaborate this comment. just say:

Validates that all tables in the descriptor table of 'request' have a unique id.


Line 422:    * 2. All tables reference to the same version of table in Analyzer::getTable().
Remove this block


Line 426:   private void validateTableConsistency(TExecRequest request) {
validateTableIds


Line 433:         for (TTableDescriptor tableDesc: descTbl.tableDescriptors) {
This code seems pretty elaborate for checking unique ids. Wouldn't it be sufficient to stick the ids into a Set and check whether an id already exists?


Line 437:             throw new IllegalStateException("Failed to verify table consistency for" +
We're not validating table consistency. Error report should be consistent with the new validateTableIds() method name.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/13/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

PS13, Line 425: if (execRequest.isSetFragments())
              : 
will clean this.


Line 439:       }
if (execRequest.isSetTupleDescriptors()) {
...
}


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
> +1. Given this happens only with "invalidate metadata", how about we change
this is a good idea! I will try to do this. but for 64bit thing I think it is not necessary if we reuse table ID(like Alex suggested). the reason is that the new largest possible IDs is lower bounded by our current implementation. we do not have a problem now so probably that can go to another patch. For this one, I think I will only focus on enforcing unique table ID.

Thank you both!


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 6:

(12 comments)

So much better!

http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 22: typedef i64 TTableId
> now i32 is ok I think.
We can also move this typedef into a more appropriate place like Descriptors.thrift because nobody else should care about it now.


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 50:   static public final long TABLE_SINK_ID = 0;
I know it doesn't matter but we typically do "public static final ..."


Line 52:   private long nextTableId = TABLE_SINK_ID + 1;
nextTableId_

change to int?


Line 54:   private long getNextTableId() { return nextTableId++; }
I think we can just remove this and inline at the single caller


Line 158:     // tableIdMap.get(table) == null iff (table == null || table instanceof View)
Maps from base table to its table id.


Line 165:       if (tupleDesc.isMaterialized()) {
Lots of nesting here, can we invert this condition, i.e.

if (!tupleDesc.isMaterialized()) continue;


Line 166:         Table table = tupleDesc.getTable();
Remember the tableId here so avoid some hash lookups.

Long tableId = null;
if (table != null && !(table instanceof View)) {
...

}


Line 168:           if (!tableIdMap.containsKey(table)) {
Use this pattern to avoid multiple lookups:

Entry e = map,.get();
if (e == null) {
  e = nextid;
  map.put();
}


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 198:   public TTupleDescriptor toThrift(Long assignedTableId) {
just tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java:

Line 665:   public TTableDescriptor toThriftDescriptor(long id, Set<Long> referencedPartitions) {
tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1567:   public TTableDescriptor toThriftDescriptor(long id, Set<Long> referencedPartitions) {
tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java:

Line 68:   public TTableDescriptor toThriftDescriptor(long id, Set<Long> referencedPartitions) {
tableId

(and elsewhere)


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#13).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referecedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
26 files changed, 194 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce unique table ID

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce unique table ID
......................................................................


Patch Set 2:

(1 comment)

I think we should avoid both forms of id inconsistency: Different tables with the same id, or the same table with different ids.

http://gerrit.cloudera.org:8080/#/c/4349/2/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 188:     private final HashMap<TTableName, TableId> tblIdMap_ =
Why do we need another map? Don't we already have a map of existing tables that we can use to get the current id?

I think it could actually be problematic to only rely on the TTableName for identification. The user could drop+add the same table and I think in that case a new id should be assigned. Thoughts?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................


Patch Set 16:

solved conflict first. addressing comment next

-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4349

to look at the new patch set (#17).

Change subject: IMPALA-1702: Enforce single-table consistency in query analysis.
......................................................................

IMPALA-1702: Enforce single-table consistency in query analysis.

Catalogd managed table id generation causes problems when
new updates arrive at frontend while queries are being
analyzed: references to the same table may suddenly refer
to a different version, and different tables may share the
same table id. This causes problems when one table overrides
another one in the thrift descriptor table sent to backend.

This commit removes the table id from the catalog Table
object; instead frontend assigns a unique id to each table
in DescriptorTable.toThrift(). It also implements a
referencedTables_ cache in Analyzer::globalState_ so that
calling Analyzer::getTable() on the same table returns the
same reference for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
27 files changed, 204 insertions(+), 219 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
......................................................................


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 22: typedef i32 TPlanNodeId
> We can also move this typedef into a more appropriate place like Descriptor
I think it is ok to push it here to stick with others. just make it an int.


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 70: import org.apache.impala.util.DisjointSet;
> will remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 50:   public static final int TABLE_SINK_ID = 0;
> I know it doesn't matter but we typically do "public static final ..."
Done


Line 52:   private int nextTableId_ = TABLE_SINK_ID + 1;
> nextTableId_
Done


Line 54:   public TupleDescriptor createTupleDescriptor(String debugName) {
> I think we can just remove this and inline at the single caller
Done


Line 158:     if (targetTable_ != null) tableIdMap.put(targetTable_, TABLE_SINK_ID);
> Maps from base table to its table id.
Done


Line 165:       Integer tableId = tableIdMap.get(table);
> Lots of nesting here, can we invert this condition, i.e.
Done


Line 166:       if (table != null && !(table instanceof View) && tableId == null) {
> Remember the tableId here so avoid some hash lookups.
Done


Line 168:         tableIdMap.put(table, tableId);
> Use this pattern to avoid multiple lookups:
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 198:   public TTupleDescriptor toThrift(Integer tableId) {
> just tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.locks.ReentrantReadWriteLock;
> will remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java:

Line 665:   public TTableDescriptor toThriftDescriptor(int tableId, Set<Long> referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1567:   public TTableDescriptor toThriftDescriptor(int tableId, Set<Long> referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java:

Line 68:   public TTableDescriptor toThriftDescriptor(int tableId, Set<Long> referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 38: import org.apache.impala.catalog.KuduTable;
> not intentional.. will revert next patch!
Done


PS6, Line 1105:       String db = insertStmt.getTargetTableName().getDb();
              :         finalizeParams.setTable_db(db == null ? queryCtx.session.database : db);
              :         HdfsTable hdfsTable = (HdfsTable) insertStmt.getTargetTable();
              :         finalizeParams.setHdfs_base_dir(hdfsTable.getHdfsBaseDir());
              :         finalizeParams.setStaging_dir(
              :             hdfsTable.getHdfsBaseDir() + "/_impala_insert_staging");
              :      
> will update
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes