You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2018/06/22 01:18:46 UTC

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10792


Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................

IMPALA-3040: Remove cache directive before dropping a table

One way to hit IMPALA-3040 is to drop a table while the catalog is
loading it. If the HDFS files of a partition are removed when the
partition is being loaded, the catalog object will be in an inconsistent
state and the catalog will fail to recognize some cached partitions and
not remove the cache directives. This patch removes the cache directives
first to avoid this race condition.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
1 file changed, 7 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/29/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 12
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:33:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2778/


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 23:56:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7:

The problem is that dropPartition() is also called on dirty partitions, in which case we should not change anything in the namenode. Maybe we should further differentiate these two cases, but that's non-trivial code change.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 22:41:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 20:32:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2778/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 20:32:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 9: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10792/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1375
PS8, Line 1375: Set<String> msPartitionNames = Sets.newHashSet();
              :     msPartitionNames.addAll(client.listPartitionNames(db_.getName(), name_, (short) -1));
              :     // Names of loaded partitions in this table
> I changed the definition of it instead of removing. We have to drop the par
nice catch.


http://gerrit.cloudera.org:8080/#/c/10792/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1414
PS9, Line 1414:     dropPartitions(dirtyPartitions, false);
Add a comment that dirtyPartitions are reloaded and hence cachedirectives are not dropped. Easier for those reading the code to understand.



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Jul 2018 22:38:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 12
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 17:59:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() if the partition is removed from HMS.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 49 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/12
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 12
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

Your theory seems plausible to me 

> I think table is dropped concurrently with

Do you know what in test_caching_ddl() is calling this drop (drop db cascade/drop table etc.) ? It does not seem to be using unique_db_fixture and runs serially. So I'm wondering what is triggering a race between load() and drop().

> Now the questions is whether listPartitionNames() returns an empty list if the table doesn't exist. The first thing to notices is that listPartitionNames() might throw NoSuchObjectException, so intuitively that should happen if the table doesn't exist, which is not true. The relevant code is at https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4717. The NoSuchObjectException is only thrown by fireReadTablePreEvent(), which is some kind of hook mechanism and might be a no-op in most cases. The backend implementation is at https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3247. It merely executes a select query which doesn't check if the table exists at all. So yes, it will return an empty list.

You are right about this. I wrote a quick HMSClient class to confirm this. Following prints 0.

$javac -cp "fe/target/dependency/*" TestListPartitionNames.java
$java -cp "fe/target/dependency/*":$HADOOP_CONF_DIR:. TestListPartitionNames

import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
import org.apache.hadoop.hive.conf.HiveConf;
import java.util.List;

public class TestListPartitionNames {
  public static void main(String[] args) throws Exception {
    HiveMetaStoreClient client = new HiveMetaStoreClient(
        new HiveConf(), null);
    List<String> parts = client.listPartitionNames("non_existent_db_blah_blah", "foo", (short) -1);
    System.out.println(parts.size());
  }
}


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Jun 2018 06:03:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

Also, thinking a bit more about your theory, are you able to reproduce it by adding Thread.sleep() s in the required places?


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Jun 2018 16:58:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

The exception thrown is:

E0614 17:03:05.768528 17538 HdfsTable.java:909] Encountered an error loading block metadata for table: cachedb.cach
ed_tbl_part
Java exception follows:
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File does not exist: /test-warehouse/cached
b.db/cached_tbl_part/j=2/b14eab6ad3ac682a-1338d1ba00000000_385360643_data.0.
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2157)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2127)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:583)
  at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(Authorizatio
nProviderProxyClientProtocol.java:94)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenod
eProtocolServerSideTranslatorPB.java:377)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1080)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272)
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:188)
  at org.apache.impala.catalog.HdfsTable.loadMetadataAndDiskIds(HdfsTable.java:904)
  at org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1403)
  at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1253)
  at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1199)
  at org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:1460)
  at org.apache.impala.catalog.TableLoadingMgr.execAsyncRefreshWork(TableLoadingMgr.java:320)
  at org.apache.impala.catalog.TableLoadingMgr.access$500(TableLoadingMgr.java:48)
  at org.apache.impala.catalog.TableLoadingMgr$1.call(TableLoadingMgr.java:175)
  at org.apache.impala.catalog.TableLoadingMgr$1.call(TableLoadingMgr.java:171)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: File does not exist: /test-warehouse/cachedb.db/cached_tbl_part/j=2/b14ea
b6ad3ac682a-1338d1ba00000000_385360643_data.0.
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2157)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2127)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:583)
  at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(Authorizatio
nProviderProxyClientProtocol.java:94)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenod
eProtocolServerSideTranslatorPB.java:377)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1080)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
  at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1326)
  at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1311)
  at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1369)
  at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:250)
  at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:246)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:246)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:237)
  at org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsTable.java:503)
  at org.apache.impala.catalog.HdfsTable.access$000(HdfsTable.java:116)
  at org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call(HdfsTable.java:335)
  at org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call(HdfsTable.java:317)
  ... 4 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /test-warehou
se/cachedb.db/cached_tbl_part/j=2/b14eab6ad3ac682a-1338d1ba00000000_385360643_data.0.
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2157)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2127)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:583)
  at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(Authorizatio
nProviderProxyClientProtocol.java:94)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenod
eProtocolServerSideTranslatorPB.java:377)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1080)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272)
  at org.apache.hadoop.ipc.Client.call(Client.java:1510)
  at org.apache.hadoop.ipc.Client.call(Client.java:1447)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
  at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolT
ranslatorPB.java:268)
  at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
  at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1324)
  ... 15 more


I went through the code again and have a new theory. I'm not confident about it though:
I think table is dropped concurrently with https://github.com/apache/impala/blob/e6abf8e86058349531caabe0a800432b1703e8f1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1361, and listPartitionNames() returned an empty list. Therefore the partition in the table is dropped at L1400 and was not loaded back. loadMetadataAndDiskIds() on the other hand operates on the list generated at L1389 so is executed despite msPartitionNames being empty, and threw the exception. 
Now the questions is whether listPartitionNames() returns an empty list if the table doesn't exist. The first thing to notices is that listPartitionNames() might throw NoSuchObjectException, so intuitively that should happen if the table doesn't exist, which is not true. The relevant code is at https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4717. The NoSuchObjectException is only thrown by fireReadTablePreEvent(), which is some kind of hook mechanism and might be a no-op in most cases. The backend implementation is at https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3247. It merely executes a select query which doesn't check if the table exists at all. So yes, it will return an empty list.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Jun 2018 02:02:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: https://issues.apache.org/jira/browse/IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Jul 2018 19:50:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

Thanks for the explanation. I guess I understand the issue now. 

The basic problem here seems to be that  HdfsTable.dropPartition() does not clean up cache directives of the partitions. We do it inside  CatalogOpExecutor#alterTableDropPartition()

  private Table alterTableDropPartition(Table tbl,.....)
  ......
  if (part.isMarkedCached()) {
    HdfsCachingUtil.removePartitionCacheDirective(part);
  }
  ....
  }

but since these partitions are dropped already using Hive, Impala clears the state with dropPartition() and uncacheTable() later does not help anymore.  Here is a simple repro of this issue.

// Create a partitioned cached table and add some partitions
impala> create table cached_tbl_part (i int) partitioned by (j int) cached in 'testPool'
impala> insert into cached_tbl_part (i,j) select 1, 2;

// Make sure cache directives are created.
$ hdfs cacheadmin -listDirectives | grep cached_tbl_part
277 testPool        1 never   /test-warehouse/cached_tbl_part                
278 testPool        1 never   /test-warehouse/cached_tbl_part/j=2  <----

// Drop the partition from hive
hive> alter table cached_tbl_part drop partition (j=2);

// Refresh the table from Impala
impala> refresh cached_tbl_part;

// Cache directives still exist
hdfs cacheadmin -listDirectives | grep cached_tbl_part

277 testPool        1 never   /test-warehouse/cached_tbl_part                
278 testPool        1 never   /test-warehouse/cached_tbl_part/j=2   <--- should have been dropped

// Drop the table from Impala

impala> drop table cached_tbl_part;

$ hdfs cacheadmin -listDirectives | grep cached_tbl_part
278 testPool        1 never   /test-warehouse/cached_tbl_part/j=2   <--- Table's directive is dropped but the partition still remains.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Jul 2018 07:43:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() to fix this bug.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 29 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives during background partition dropping

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives during background partition dropping
......................................................................

IMPALA-3040: Remove cache directives during background partition dropping

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the table is later dropped, the partition won't be uncached either
because it has been removed then. This patch removes the cache directive
in dropPartition() to fix this bug.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 8 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10792/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10792/1//COMMIT_MSG@9
PS1, Line 9: drop a table while the catalog is
           : loading it. If the HDFS files of a partition are removed when the
           : partition is being loaded, the catalog object will be in an inconsistent
           : state and the catalog will fail to recognize some cached partitions and
           : not remove the cache directives
Can you clarify what this means? Looking at the code, getOrLoadTable() and removeTable() are synchronized on 'versionLock_'.  (or) you mean they are removed external to Impala like using Hive / Hadoop?

Just want to be sure that I understand the problem correctly.


http://gerrit.cloudera.org:8080/#/c/10792/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10792/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1452
PS1, Line 1452: catch (NoSuchObjectException e) {
              :         throw new ImpalaRuntimeException(String.format("Table %s no longer exists in " +
              :             "the Hive MetaStore. Run 'invalidate metadata %s' to update the Impala " +
              :             "catalog.", tableName, tableName));
              :       } catch (TException e) {
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, "dropTable"), e);
              :       }
You mean if we happen to throw any of these, we don't uncache the table?



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 21:05:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7:

How about having an overloaded method, dropPartitions(list<Partition>, boolean removeCacheDirective). It can true by default, but we make it false in the dirtyPatitions case. Is it more complicated than that?


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Jul 2018 00:14:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7:

Did you get a chance to triage the GVO failure? Is it related to the patch change?


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 18:37:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 12:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/29/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 12
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 17:59:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() to fix this bug.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 28 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 9:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/8/ 

Running initial code review checks.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Jul 2018 19:17:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10792/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10792/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1317
PS2, Line 1317:       if (db != null) {
I don't think this is the right approach. Firstly this is still prone to races. Secondly, if the dropDatabase() in L1324 fails for some reason, we'd have unnecessarily uncached the tables eagerly.

How about fixing the HdfsPartition.dropPartition() to also cleanup the partition directive? (refer to my example in the CR comment).



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Jul 2018 07:47:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: https://issues.apache.org/jira/browse/IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Jul 2018 19:50:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 7:

> Patch Set 7:
> 
> Did you get a chance to triage the GVO failure? Is it related to the patch change?

It failed at https://github.com/apache/impala/blob/2a40e8f2a973391b61165ebd95cb30b9b67d93ba/testdata/workloads/functional-query/queries/QueryTest/hdfs-caching.test#L249. Altering the table rendered already cached partitions as uncached. I'm trying to understand that code path.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 21:55:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives during background partition dropping

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives during background partition dropping
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@7
PS3, Line 7: during background partition dropping
may be say external table drops?


http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@9
PS3, Line 9:  
Add some context about what happens when it is dropped from Hive.


http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@13
PS3, Line 13: 
Could you add a test for this in test_hdfs_caching?


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1158
PS3, Line 1158:    * HdfsPartition that was dropped or null if the partition does not exist.
Update that this drops the cache directive if its cached.


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1191
PS3, Line 1191:     if (partition.isMarkedCached()) {
I think this should only run on the Catalog server?


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1193
PS3, Line 1193:         HdfsCachingUtil.removePartitionCacheDirective(partition);
Do we need to remove a similar check from CatalogOpEx#alterTableDropPartition()? No point in doing it twice.



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Jul 2018 22:39:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................


Patch Set 2:

> Do you know what in test_caching_ddl() is calling this drop (drop db cascade/drop table etc.) ?
https://github.com/apache/impala/blob/master/tests/query_test/test_hdfs_caching.py#L207
BTW, The specific case I looked into failed in test_caching_ddl_drop_database.

> Also, thinking a bit more about your theory, are you able to reproduce it by adding Thread.sleep() s in the required places?
The tricky part is that reloadTable() calls into HMS to get the table before loading partitions and we need to let that one succeed. Plus load() is called multiple times and the exact timing becomes unclear. I spent some time on it but no luck so far.

The fix in this patch should still work.


-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Jun 2018 02:14:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 4: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10792/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1174
PS4, Line 1174: LOG.error("Unable to remove a cache directive: " + e.getMessage())
Please include table and partition name in the message. Also do something like LOG.error(msg, e) so that the stack trace is printed too.


http://gerrit.cloudera.org:8080/#/c/10792/4/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

http://gerrit.cloudera.org:8080/#/c/10792/4/tests/query_test/test_hdfs_caching.py@291
PS4, Line 291:     assert num_entries_pre + 1 == get_num_cache_requests()
nit: add comments like in L287. Something like we expect the partition cache directive to be dropped etc..


http://gerrit.cloudera.org:8080/#/c/10792/4/tests/query_test/test_hdfs_caching.py@292
PS4, Line 292:     self.client.execute("drop table cached_tbl_part")
nit: Do we need to put the drop in a finally block? since we are using 'cached_tbl_part' in different tests, if the cleanup does not happen for some reason (due to test errors etc) other tests could potentially fail (for ex: L224).  Or just use another/random table name to avoid this mess.



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Jul 2018 06:48:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

Feel free to carry +2 after the fix and submit for GVO.

http://gerrit.cloudera.org:8080/#/c/10792/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1175
PS5, Line 1175: getName
getName() does not print the db info. Use getFullName() instead. Also you can just use getFullName() instead of partition.getTable().getFullName().  We are already in the table class.



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 20:28:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() if the partition is removed from HMS.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 48 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() if the partition is removed from HMS.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Reviewed-on: http://gerrit.cloudera.org:8080/10792
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Tianyi Wang <tw...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 49 insertions(+), 13 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Tianyi Wang: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 13
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@7
PS3, Line 7: if a partition is dropped externally
> may be say external table drops?
Done


http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@9
PS3, Line 9:  
> Add some context about what happens when it is dropped from Hive.
Done


http://gerrit.cloudera.org:8080/#/c/10792/3//COMMIT_MSG@13
PS3, Line 13: table/database, the cache directive won't be removed even if the table
> Could you add a test for this in test_hdfs_caching?
Done


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1158
PS3, Line 1158:    * HdfsPartition that was dropped or null if the partition does not exist.
> Update that this drops the cache directive if its cached.
Done


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1191
PS3, Line 1191:       // If there are multiple partition ids corresponding to a literal, remove
> I think this should only run on the Catalog server?
I checked the callers of this function and it can only be called from catalogd.


http://gerrit.cloudera.org:8080/#/c/10792/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1193
PS3, Line 1193:       if (partitionIds.size() > 1) partitionIds.remove(partitionId);
> Do we need to remove a similar check from CatalogOpEx#alterTableDropPartiti
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Jul 2018 00:54:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directive before dropping a table
......................................................................

IMPALA-3040: Remove cache directive before dropping a table

One way to hit IMPALA-3040 is to drop a table while the catalog is
loading it.
The problematic test drops the cached table/database and then checks if
the cache directive has been removed. When the table is dropped, the
HMS metadata will be removed first. If a concurrent table loading
operation is running, it will fail because it cannot find the table in
HMS. When the loading procedure throws, the old partition objects have
already been cleared from the table catalog object, so the catalog won't
remove the cache directives because the metadata has gone.
There are several potential solutions:
- Lock the tables and the databases before dropping. We don't currently
  have database lock so this is not trivial.
- Fix the table loading procedure so that it loads and replaces
  existing partitions atomically.
- Remove the cache directives first.
This patch takes the last approach.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
1 file changed, 7 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-3040: Remove cache directives if a partition is dropped externally

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/10792 )

Change subject: IMPALA-3040: Remove cache directives if a partition is dropped externally
......................................................................

IMPALA-3040: Remove cache directives if a partition is dropped externally

HdfsTable.dropPartition() doesn't uncache the partition right now. If
the partition is dropped from Hive and refreshed in Impala, the
partition will be removed from the catalog but the cache directive
remains. Because Impala directly uses HMS client to drop a
table/database, the cache directive won't be removed even if the table
is dropped in Impala, if the backgroud loading is run concurrenty with
the HMS client RPC call. This patch removes the cache directive in
dropPartition() to fix this bug.

Change-Id: Id7701a499405e961456adea63f3592b43bd69170
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 30 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10792/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170
Gerrit-Change-Number: 10792
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>