You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mj...@apache.org on 2016/12/13 03:13:25 UTC

[1/2] incubator-impala git commit: IMPALA-4611: Checking perms on S3 files is a very expensive no-op

Repository: incubator-impala
Updated Branches:
  refs/heads/master 2414d4c9c -> 73e41cea1


IMPALA-4611: Checking perms on S3 files is a very expensive no-op

We call getPermissions() on partition directories to find out if
Impala has access to those files. On S3, this currently is a no-op
as the S3A connector does not try to set/get the permissions for S3
objects. So, it always returns the default set of permissions -> 777.
However, it still makes a roundtrip to S3 causing a slow down in the
Catalog.

We can return the READ_WRITE permission immediately if we know we are
accessing an S3 file, thereby avoiding the round trip to S3 for every
partition. This will greatly speedup metadata operations for S3 tables
and partitions, which is already known to be a big bottleneck.

If and when the S3A connector is able to manage permissions in
the future, we need to revisit this code. However, as permissions on
S3 are unsupported by Impala right now, we might as well gain on perf.

Change-Id: If9d1072c185a6162727019cdf1cb34d7f3f1c75c
Reviewed-on: http://gerrit.cloudera.org:8080/5449
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/ffbdeda9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/ffbdeda9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/ffbdeda9

Branch: refs/heads/master
Commit: ffbdeda9469997dce6c6c3a80c78877486a3bdd9
Parents: 2414d4c
Author: Sailesh Mukil <sa...@cloudera.com>
Authored: Fri Dec 9 15:12:25 2016 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Tue Dec 13 01:56:28 2016 +0000

----------------------------------------------------------------------
 fe/src/main/java/org/apache/impala/catalog/HdfsTable.java | 10 ++++++++++
 1 file changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ffbdeda9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index ae5e811..c2c569f 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -795,9 +795,19 @@ public class HdfsTable extends Table {
    * permissions Impala has on the given path. If the path does not exist, recurses up
    * the path until a existing parent directory is found, and inherit access permissions
    * from that.
+   * Always returns READ_WRITE for S3 files.
    */
   private TAccessLevel getAvailableAccessLevel(FileSystem fs, Path location)
       throws IOException {
+
+    // Avoid calling getPermissions() on file path for S3 files, as that makes a round
+    // trip to S3. Also, the S3A connector is currently unable to manage S3 permissions,
+    // so for now it is safe to assume that all files(objects) have READ_WRITE
+    // permissions, as that's what the S3A connector will always return too.
+    // TODO: Revisit if the S3A connector is updated to be able to manage S3 object
+    // permissions. (see HADOOP-13892)
+    if (FileSystemUtil.isS3AFileSystem(fs)) return TAccessLevel.READ_WRITE;
+
     FsPermissionChecker permissionChecker = FsPermissionChecker.getInstance();
     while (location != null) {
       if (fs.exists(location)) {


[2/2] incubator-impala git commit: IMPALA-4642: Fix TestFragmentLifecycle failures; kudu test must wait

Posted by mj...@apache.org.
IMPALA-4642: Fix TestFragmentLifecycle failures; kudu test must wait

Fixes test failures in TestFragmentLifecycle when it runs
after TestKuduMemLimits which takes some time for all
fragments to finish closing, even though the query is
finished. TestFragmentLifecycle checks that there are no
fragments in flight. For now, this fixes the tests by
forcing TestKuduMemLimits to wait for all 'in flight'
fragments to complete before continuing. We still need to
understand why the KuduScanNode/KuduScanner is taking so
long to Close() (see IMPALA-4654).

Change-Id: Ia655a37ff06e92cc55ba05f01d5e94fe39447c65
Reviewed-on: http://gerrit.cloudera.org:8080/5481
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Reviewed-by: Henry Robinson <he...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/73e41cea
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/73e41cea
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/73e41cea

Branch: refs/heads/master
Commit: 73e41cea196703701d40cc67f919287fb3511b9b
Parents: ffbdeda
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Mon Dec 12 14:32:07 2016 -0800
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Tue Dec 13 03:12:42 2016 +0000

----------------------------------------------------------------------
 tests/query_test/test_kudu.py | 11 +++++++++++
 1 file changed, 11 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/73e41cea/tests/query_test/test_kudu.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_kudu.py b/tests/query_test/test_kudu.py
index 9a5bab9..17769bd 100644
--- a/tests/query_test/test_kudu.py
+++ b/tests/query_test/test_kudu.py
@@ -34,6 +34,8 @@ import textwrap
 
 from tests.common import KUDU_MASTER_HOSTS
 from tests.common.kudu_test_suite import KuduTestSuite
+from tests.common.impala_cluster import ImpalaCluster
+from tests.verifiers.metric_verifier import MetricVerifier
 
 LOG = logging.getLogger(__name__)
 
@@ -614,3 +616,12 @@ class TestKuduMemLimits(KuduTestSuite):
         if (mem_limit > self.QUERY_MEM_LIMITS[i]):
           raise
         assert "Memory limit exceeded" in str(e)
+
+    # IMPALA-4645: Wait for fragments to complete; in some tests KuduScanNodes took some
+    # time to Close() after the query returned all rows. This is necessary to ensure
+    # these queries do not impact other tests.
+    # TODO: Scan nodes shouldn't take so long to shutdown; remove when this is
+    # fixed (IMPALA-4654).
+    verifiers = [ MetricVerifier(i.service) for i in ImpalaCluster().impalads ]
+    for v in verifiers:
+      v.wait_for_metric("impala-server.num-fragments-in-flight", 0, timeout=120)