You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2019/07/27 10:51:38 UTC

[Impala-ASF-CR] IMPALA-8600 WIP: Refresh transactional tables

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13938


Change subject: IMPALA-8600 WIP: Refresh transactional tables
......................................................................

IMPALA-8600 WIP: Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similary, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 202 insertions(+), 75 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4713/


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 08:45:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 WIP: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 WIP: Refresh transactional tables
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13938/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1379
PS1, Line 1379:           // HMS partition objects in an add_partition event. We try to do the same here by
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13938/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1391
PS1, Line 1391:                   + "as table was not present in the catalog.", getFullyQualifiedTblName());
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13938/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1493
PS1, Line 1493:             infoLog("Table {} partition {} has been refreshed", getFullyQualifiedTblName(),
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 10:52:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#7).

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................

IMPALA-8600: Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similarly, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M tests/custom_cluster/test_event_processing.py
8 files changed, 220 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/7
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4713/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 07:06:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 4:

(10 comments)

Seems good to me, only found a couple of nits

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@20
PS4, Line 20: Similary
nit: Similarly


http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@23
PS4, Line 23: changes like adding a column neither on table level or on partition
Offline you mentioned lastDdlTime. Maybe you can add a comment or TODO about it. It's also OK to wait for Hive to fix the writeId bug.


http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@28
PS4, Line 28: updates the table level writeId even for schema changes.
You mentioned that it is a Hive-bug. Do we have an open Jira for that?


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java@161
PS4, Line 161: Refresh
nit: Refreshing a partition / REFRESH on a partition ?


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1900
PS4, Line 1900: 
nit: Please add comment


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@733
PS4, Line 733: org.apache.hadoop.hive.metastore.api.Partition
We imported it above.


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1370
PS4, Line 1370:  
nit: on


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1473
PS4, Line 1473:  
nit: on


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1565
PS4, Line 1565:         // Reload the whole table if it's a transactional table.
              :         if (AcidUtils.isTransactionalTable(msTbl_.getParameters())) {
              :           if (!catalog_.reloadTableIfExists(dbName_, tblName_,
              :               "processing DROP_PARTITION event from HMS")) {
              :             debugLog("Automatic refresh table {} failed as the table is not "
              :                 + "present either in catalog or metastore.", getFullyQualifiedTblName());
              :           } else {
              :             infoLog("Table {} has been refreshed after drop_partition.",
              :                 getFullyQualifiedTblName());
              :           }
              :         } else {
nit: the above changes are quite similar. Can we put them into some helper method?


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3611
PS4, Line 3611: isTableLoadedInCatalog
nit: I think this variable became unnecessary.



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Jul 2019 14:37:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@20
PS4, Line 20: Similarl
> nit: Similarly
Done


http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@23
PS4, Line 23: changes like adding a column neither on table level or on partition
> Offline you mentioned lastDdlTime. Maybe you can add a comment or TODO abou
I have added a TODO to CatalogOpExecutor. I'll update that comment with Jira IDs. Is that ok?


http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@28
PS4, Line 28: updates the table level writeId even for schema changes.
> You mentioned that it is a Hive-bug. Do we have an open Jira for that?
Added it also to the TODO comment in CatalogOpExecutor.


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java@161
PS4, Line 161: Refresh
> nit: Refreshing a partition / REFRESH on a partition ?
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1900
PS4, Line 1900: 
> nit: Please add comment
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@733
PS4, Line 733: return true;
> We imported it above.
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1370
PS4, Line 1370: 
> nit: on
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1473
PS4, Line 1473: 
> nit: on
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1565
PS4, Line 1565:           // fails, we throw a MetastoreNotificationNeedsInvalidateException
              :           infoLog("{} partitions dropped from table {}. Trying "
              :               + "to refresh.", droppedPartitions_.size(), getFullyQualifiedTblName());
              :           for (Map<String, String> partSpec : droppedPartitions_) {
              :             List<TPartitionKeyValue> tPartSpec = new ArrayList<>(partSpec.size());
              :             for (Map.Entry<String, String> entry : partSpec.entrySet()) {
              :               tPartSpec.add(new TPartitionKeyValue(entry.getKey(), entry.getValue()));
              :             }
              :             if (!catalog_.reloadPartitionIfExists(dbName_, tblName_, tPartSpec,
              :                 "processing DROP_PARTITION event from HMS")) {
              :               de
> nit: the above changes are quite similar. Can we put them into some helper 
Done


http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13938/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3611
PS4, Line 3611: isTableLoadedInCatalog
> nit: I think this variable became unnecessary.
It is necessary to capture if the table was loaded or not before we call getExistingTable() below. If it wasn't loaded then calling getExistingTable() would load it from HMS in the background and no need to do the refresh again.



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 09:55:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#4).

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................

IMPALA-8600 Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similary, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 215 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#3).

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................

IMPALA-8600 Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similary, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 220 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4076/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Jul 2019 13:32:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4077/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Jul 2019 13:40:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4715/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 10:58:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13938/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13938/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-8600 Refresh transactional tables
nit: add : for sake of consistency with other commits


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3531
PS6, Line 3531: getMetaStoreTable
About the name: "get" does tell me whether the table is get from metastore or we just convert "Table tbl" to org.apache.hadoop.hive.metastore.api.Table. I would prefer something like "getTableFromMetaStore" or "fetchMetaStoreTable".


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3532
PS6, Line 3532: tbl
If I see correctly tbl is only used in a precondition check.


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3626
PS6, Line 3626:                   if (hmsTbl == null) {
              :                       throw new CatalogException("Unable to get table " +
I am 100% sure, but it makes sense to me to throw the exception at line 3671 similarly to other cases of "missing tables".


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3628
PS6, Line 3628:                       tblName.toString() + " from HMS. A concurrent operation might " +
              :                       "have removed it.");
nit: indentation +4


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3632
PS6, Line 3632:                   if (!hdfsTable.isPartitioned() &&
Is this path tested? Even if there are "unneeded refreshes" among the ACID tests, I would add one explicitly to test REFRESH on an unchanged table, e.g. in https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/acid.test


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3635
PS6, Line 3635:                     // No need to refresh the table if the local writeId equals to the
It could be mentioned here that Hive seems to increase writeId for metadata operations too



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 13:45:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 17:16:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4100/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 15:31:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13938/4//COMMIT_MSG@23
PS4, Line 23: changes like adding a column neither on table level or on partition
> I have added a TODO to CatalogOpExecutor. I'll update that comment with Jir
Yeah, sure.


http://gerrit.cloudera.org:8080/#/c/13938/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@615
PS5, Line 615: present
nit: present in catalog neither in metastore ?



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 12:38:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#2).

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................

IMPALA-8600 Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similary, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 219 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13938/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13938/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-8600: Refresh transactional tables
> nit: add : for sake of consistency with other commits
Done


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3531
PS6, Line 3531: getTableFromMetaS
> About the name: "get" does tell me whether the table is get from metastore 
I went for getTableFromMetaStore. Done


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3532
PS6, Line 3532: ame
> If I see correctly tbl is only used in a precondition check.
Indeed. Dropped.


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3626
PS6, Line 3626:                       throw new TableNotFoundException("Table not found: " +
              :                           tblName.toString());
> I am 100% sure, but it makes sense to me to throw the exception at line 367
Done


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3628
PS6, Line 3628:                   }
              :                   HdfsTable hdfsTable = (H
> nit: indentation +4
Done


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3632
PS6, Line 3632:                       MetastoreShim.getWriteIdFromMSTable(hmsTbl)) {
> Is this path tested? Even if there are "unneeded refreshes" among the ACID 
Done


http://gerrit.cloudera.org:8080/#/c/13938/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3635
PS6, Line 3635:                     LOG.debug("Skip reloading table " + tblName.toString() +
> It could be mentioned here that Hive seems to increase writeId for metadata
That is the expected behaviour from Hive, I wouldn't comment it here.



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 14:51:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13938/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1379
PS2, Line 1379:           // HMS partition objects in an add_partition event. We try to do the same here by
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13938/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1391
PS2, Line 1391:                   + "as table was not present in the catalog.", getFullyQualifiedTblName());
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13938/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1493
PS2, Line 1493:             infoLog("Table {} partition {} has been refreshed", getFullyQualifiedTblName(),
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Jul 2019 12:52:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13938/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13938/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@615
PS5, Line 615: present
> nit: present in catalog neither in metastore ?
I think this is correct as it is:
"not present either ... or ..."
Or it could be rephrased as:
"present neither ... nor ..."

I prefer the first one. There is a missing "in" though.



-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 12:57:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4078/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Jul 2019 14:14:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4094/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 10:35:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 WIP: Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 WIP: Refresh transactional tables
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4047/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 11:31:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 6:

> (7 comments)

I am 100% sure -> I am NOT 100% sure :D


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 13:45:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4097/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Jul 2019 13:38:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#5).

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................

IMPALA-8600 Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similarly, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 214 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................


Patch Set 8: Code-Review+2

Thanks for the changes!


-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 11:29:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8600 Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13938

to look at the new patch set (#6).

Change subject: IMPALA-8600 Refresh transactional tables
......................................................................

IMPALA-8600 Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similarly, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M tests/custom_cluster/test_event_processing.py
7 files changed, 214 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/13938/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8600: Refresh transactional tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13938 )

Change subject: IMPALA-8600: Refresh transactional tables
......................................................................

IMPALA-8600: Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similarly, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Reviewed-on: http://gerrit.cloudera.org:8080/13938
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M tests/custom_cluster/test_event_processing.py
8 files changed, 220 insertions(+), 85 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Gerrit-Change-Number: 13938
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>