You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ph...@apache.org on 2018/05/22 03:34:08 UTC

[1/5] impala git commit: IMPALA-6317: Add -cmake_only option to buildall.sh

Repository: impala
Updated Branches:
  refs/heads/master 5c7d3b12e -> 23e11dc72


IMPALA-6317: Add -cmake_only option to buildall.sh

It's sometimes useful to be able to build a complete Impala dev
environment without necessarily building the Impala binary itself
-- e.g., when one wants to use the internal test framework to run
tests against an instance of Impala running on a remote cluster.

- This patch adds a -cmake_only flag to buildall.sh, which then
  gets propagated to the make_impala.sh.

- Added a missing line to the help text re: passing the -ninja
  command line option

Change-Id: If31a4e29425a6a20059cba2f43b72e4fb908018f
Reviewed-on: http://gerrit.cloudera.org:8080/10455
Reviewed-by: David Knupp <dk...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/7485d608
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/7485d608
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/7485d608

Branch: refs/heads/master
Commit: 7485d6082cb9d298e7ba5a829ff15dcc4937d338
Parents: 5c7d3b1
Author: David Knupp <dk...@cloudera.com>
Authored: Mon Mar 5 17:14:36 2018 -0800
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Mon May 21 21:55:49 2018 +0000

----------------------------------------------------------------------
 buildall.sh | 5 +++++
 1 file changed, 5 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/7485d608/buildall.sh
----------------------------------------------------------------------
diff --git a/buildall.sh b/buildall.sh
index 1805a16..a86581a 100755
--- a/buildall.sh
+++ b/buildall.sh
@@ -183,6 +183,9 @@ do
       LZO_CMAKE_ARGS+=" -GNinja"
       MAKE_CMD=ninja
       ;;
+    -cmake_only)
+      MAKE_IMPALA_ARGS+=" -cmake_only"
+      ;;
     -help|*)
       echo "buildall.sh - Builds Impala and runs all tests."
       echo "[-noclean] : Omits cleaning all packages before building. Will not kill"\
@@ -217,6 +220,8 @@ do
       echo "[-so|-build_shared_libs] : Dynamically link executables (default is static)"
       echo "[-kerberize] : Enable kerberos on the cluster"
       echo "[-fe_only] : Build just the frontend"
+      echo "[-ninja] : Use ninja instead of make"
+      echo "[-cmake_only] : Generate makefiles only, instead of doing a full build"
       echo "-----------------------------------------------------------------------------
 Examples of common tasks:
 


[4/5] impala git commit: IMPALA-7019: Schedule EC as remote & disable failed tests

Posted by ph...@apache.org.
IMPALA-7019: Schedule EC as remote & disable failed tests

This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.

Testing: It passes core tests.

Cherry-picks: not for 2.x.

Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky <tb...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/21d92aac
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/21d92aac
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/21d92aac

Branch: refs/heads/master
Commit: 21d92aacbfdbe9780b983acfacd02ced4bb0c132
Parents: 482ea39
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Mon May 14 12:14:35 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 22 01:10:14 2018 +0000

----------------------------------------------------------------------
 common/fbs/CatalogObjects.fbs                   |  3 +++
 .../apache/impala/catalog/HdfsPartition.java    | 28 +++++++++++++-------
 .../org/apache/impala/catalog/HdfsTable.java    | 18 ++++++-------
 .../org/apache/impala/planner/HdfsScanNode.java |  3 ++-
 tests/common/skip.py                            | 11 ++++++--
 .../custom_cluster/test_admission_controller.py |  4 ++-
 tests/custom_cluster/test_hdfs_fd_caching.py    |  3 ++-
 tests/metadata/test_explain.py                  |  3 ++-
 tests/query_test/test_hdfs_caching.py           |  3 ++-
 tests/query_test/test_insert.py                 |  4 ++-
 tests/query_test/test_insert_parquet.py         |  3 ++-
 tests/query_test/test_mt_dop.py                 |  3 +++
 tests/query_test/test_nested_types.py           |  5 +++-
 tests/query_test/test_queries.py                |  5 ++++
 tests/query_test/test_query_mem_limit.py        |  2 ++
 tests/query_test/test_scanners.py               |  4 +++
 tests/util/filesystem_utils.py                  |  1 +
 17 files changed, 75 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/common/fbs/CatalogObjects.fbs
----------------------------------------------------------------------
diff --git a/common/fbs/CatalogObjects.fbs b/common/fbs/CatalogObjects.fbs
index c08099d..d320dfa 100644
--- a/common/fbs/CatalogObjects.fbs
+++ b/common/fbs/CatalogObjects.fbs
@@ -73,4 +73,7 @@ table FbFileDesc {
 
   // List of FbFileBlocks that make up this file.
   file_blocks: [FbFileBlock] (id: 4);
+
+  // Whether this file is erasure-coded
+  is_ec: bool = false (id: 5);
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
index 1b05804..3939ae2 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
@@ -102,22 +102,29 @@ public class HdfsPartition implements Comparable<HdfsPartition> {
      * Creates the file descriptor of a file represented by 'fileStatus' with blocks
      * stored in 'blockLocations'. 'fileSystem' is the filesystem where the
      * file resides and 'hostIndex' stores the network addresses of the hosts that store
-     * blocks of the parent HdfsTable. Populates 'numUnknownDiskIds' with the number of
-     * unknown disk ids.
+     * blocks of the parent HdfsTable. 'isEc' indicates whether the file is erasure-coded.
+     * Populates 'numUnknownDiskIds' with the number of unknown disk ids.
      */
     public static FileDescriptor create(FileStatus fileStatus,
         BlockLocation[] blockLocations, FileSystem fileSystem,
-        ListMap<TNetworkAddress> hostIndex, Reference<Long> numUnknownDiskIds)
-        throws IOException {
+        ListMap<TNetworkAddress> hostIndex, boolean isEc,
+        Reference<Long> numUnknownDiskIds) throws IOException {
       Preconditions.checkState(FileSystemUtil.supportsStorageIds(fileSystem));
       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
       int[] fbFileBlockOffsets = new int[blockLocations.length];
       int blockIdx = 0;
       for (BlockLocation loc: blockLocations) {
-        fbFileBlockOffsets[blockIdx++] = FileBlock.createFbFileBlock(fbb, loc, hostIndex,
-            numUnknownDiskIds);
+        if (isEc) {
+          fbFileBlockOffsets[blockIdx++] = FileBlock.createFbFileBlock(fbb,
+              loc.getOffset(), loc.getLength(),
+              (short) hostIndex.getIndex(REMOTE_NETWORK_ADDRESS));
+        } else {
+          fbFileBlockOffsets[blockIdx++] =
+              FileBlock.createFbFileBlock(fbb, loc, hostIndex, numUnknownDiskIds);
+        }
       }
-      return new FileDescriptor(createFbFileDesc(fbb, fileStatus, fbFileBlockOffsets));
+      return new FileDescriptor(createFbFileDesc(fbb, fileStatus, fbFileBlockOffsets,
+          isEc));
     }
 
     /**
@@ -132,7 +139,8 @@ public class HdfsPartition implements Comparable<HdfsPartition> {
       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
       int[] fbFileBlockOffets =
           synthesizeFbBlockMd(fbb, fileStatus, fileFormat, hostIndex);
-      return new FileDescriptor(createFbFileDesc(fbb, fileStatus, fbFileBlockOffets));
+      return new FileDescriptor(createFbFileDesc(fbb, fileStatus, fbFileBlockOffets,
+          false));
     }
 
     /**
@@ -142,13 +150,14 @@ public class HdfsPartition implements Comparable<HdfsPartition> {
      * buffer.
      */
     private static FbFileDesc createFbFileDesc(FlatBufferBuilder fbb,
-        FileStatus fileStatus, int[] fbFileBlockOffets) {
+        FileStatus fileStatus, int[] fbFileBlockOffets, boolean isEc) {
       int fileNameOffset = fbb.createString(fileStatus.getPath().getName());
       int blockVectorOffset = FbFileDesc.createFileBlocksVector(fbb, fbFileBlockOffets);
       FbFileDesc.startFbFileDesc(fbb);
       FbFileDesc.addFileName(fbb, fileNameOffset);
       FbFileDesc.addLength(fbb, fileStatus.getLen());
       FbFileDesc.addLastModificationTime(fbb, fileStatus.getModificationTime());
+      FbFileDesc.addIsEc(fbb, isEc);
       HdfsCompression comp = HdfsCompression.fromFileName(fileStatus.getPath().getName());
       FbFileDesc.addCompression(fbb, comp.toFb());
       FbFileDesc.addFileBlocks(fbb, blockVectorOffset);
@@ -209,6 +218,7 @@ public class HdfsPartition implements Comparable<HdfsPartition> {
 
     public long getModificationTime() { return fbFileDescriptor_.lastModificationTime(); }
     public int getNumFileBlocks() { return fbFileDescriptor_.fileBlocksLength(); }
+    public boolean getIsEc() {return fbFileDescriptor_.isEc(); }
 
     public FbFileBlock getFbFileBlock(int idx) {
       return fbFileDescriptor_.fileBlocks(idx);

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index 9910eb8..d6b8fb8 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -434,14 +434,14 @@ public class HdfsTable extends Table {
         continue;
       }
       FileDescriptor fd;
-      // Block locations are manually synthesized if the underlying fs does not support
-      // the block location API.
       if (synthesizeFileMd) {
+        // Block locations are manually synthesized if the underlying fs does not support
+        // the block location API.
         fd = FileDescriptor.createWithSynthesizedBlockMd(fileStatus,
             partitions.get(0).getFileFormat(), hostIndex_);
       } else {
-        fd = FileDescriptor.create(fileStatus,
-            fileStatus.getBlockLocations(), fs, hostIndex_, numUnknownDiskIds);
+        fd = FileDescriptor.create(fileStatus, fileStatus.getBlockLocations(), fs,
+            hostIndex_, fileStatus.isErasureCoded(), numUnknownDiskIds);
       }
       newFileDescs.add(fd);
       ++loadStats.loadedFiles;
@@ -508,13 +508,13 @@ public class HdfsTable extends Table {
       FileDescriptor fd = fileDescsByName.get(fileName);
       if (isPartitionMarkedCached || hasFileChanged(fd, fileStatus)) {
         if (synthesizeFileMd) {
-          fd = FileDescriptor.createWithSynthesizedBlockMd(fileStatus,
-              fileFormat, hostIndex_);
+          fd = FileDescriptor.createWithSynthesizedBlockMd(fileStatus, fileFormat,
+              hostIndex_);
         } else {
           BlockLocation[] locations =
-            fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
-          fd = FileDescriptor.create(fileStatus, locations, fs, hostIndex_,
-              numUnknownDiskIds);
+              fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
+            fd = FileDescriptor.create(fileStatus, locations, fs, hostIndex_,
+                fileStatus.isErasureCoded(), numUnknownDiskIds);
         }
         ++loadStats.loadedFiles;
       } else {

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
index dbaa965..d0c83c3 100644
--- a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
@@ -801,7 +801,8 @@ public class HdfsScanNode extends ScanNode {
             // Translate from network address to the global (to this request) host index.
             Integer globalHostIdx = analyzer.getHostIndex().getIndex(networkAddress);
             location.setHost_idx(globalHostIdx);
-            if (checkMissingDiskIds && FileBlock.getDiskId(block, i) == -1) {
+            if (checkMissingDiskIds && !fileDesc.getIsEc() &&
+                FileBlock.getDiskId(block, i) == -1) {
               ++numScanRangesNoDiskIds_;
               partitionMissingDiskIds = true;
               fileDescMissingDiskIds = true;

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/common/skip.py
----------------------------------------------------------------------
diff --git a/tests/common/skip.py b/tests/common/skip.py
index 9a5d95d..53fbe85 100644
--- a/tests/common/skip.py
+++ b/tests/common/skip.py
@@ -26,11 +26,12 @@ from functools import partial
 
 from tests.common.environ import IMPALAD_BUILD
 from tests.util.filesystem_utils import (
+    IS_ADLS,
+    IS_EC,
+    IS_HDFS,
     IS_ISILON,
     IS_LOCAL,
-    IS_HDFS,
     IS_S3,
-    IS_ADLS,
     SECONDARY_FILESYSTEM)
 
 
@@ -143,3 +144,9 @@ class SkipIfNotHdfsMinicluster:
 class SkipIfBuildType:
   not_dev_build = pytest.mark.skipif(not IMPALAD_BUILD.is_dev(),
       reason="Tests depends on debug build startup option.")
+
+class SkipIfEC:
+  remote_read = pytest.mark.skipif(IS_EC, reason="EC files are read remotely and "
+      "features relying on local read do not work.")
+  oom = pytest.mark.skipif(IS_EC, reason="Probably broken by HDFS-13540.")
+  fix_later = pytest.mark.skipif(IS_EC, reason="It should work but doesn't.")

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/custom_cluster/test_admission_controller.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_admission_controller.py b/tests/custom_cluster/test_admission_controller.py
index d1d9dd8..eba0358 100644
--- a/tests/custom_cluster/test_admission_controller.py
+++ b/tests/custom_cluster/test_admission_controller.py
@@ -32,7 +32,8 @@ from tests.common.environ import specific_build_type_timeout, IMPALAD_BUILD
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import (
     SkipIfS3,
-    SkipIfADLS)
+    SkipIfADLS,
+    SkipIfEC)
 from tests.common.test_dimensions import (
     create_single_exec_option_dimension,
     create_uncompressed_text_dimension)
@@ -384,6 +385,7 @@ class TestAdmissionController(TestAdmissionControllerBase, HS2TestSuite):
 
   @SkipIfS3.hdfs_block_size
   @SkipIfADLS.hdfs_block_size
+  @SkipIfEC.fix_later
   @pytest.mark.execute_serially
   @CustomClusterTestSuite.with_args(
       impalad_args=impalad_admission_ctrl_flags(max_requests=1, max_queued=1,

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/custom_cluster/test_hdfs_fd_caching.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_hdfs_fd_caching.py b/tests/custom_cluster/test_hdfs_fd_caching.py
index ad80cef..1afe431 100644
--- a/tests/custom_cluster/test_hdfs_fd_caching.py
+++ b/tests/custom_cluster/test_hdfs_fd_caching.py
@@ -18,7 +18,7 @@
 import pytest
 
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
-from tests.common.skip import SkipIfLocal
+from tests.common.skip import SkipIfLocal, SkipIfEC
 from tests.util.filesystem_utils import (
     IS_ISILON,
     IS_S3,
@@ -26,6 +26,7 @@ from tests.util.filesystem_utils import (
 from time import sleep
 
 @SkipIfLocal.hdfs_fd_caching
+@SkipIfEC.remote_read
 class TestHdfsFdCaching(CustomClusterTestSuite):
   """Tests that if HDFS file handle caching is enabled, file handles are actually cached
   and the associated metrics return valid results. In addition, tests that the upper bound

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/metadata/test_explain.py
----------------------------------------------------------------------
diff --git a/tests/metadata/test_explain.py b/tests/metadata/test_explain.py
index 3ad411a..ba206f2 100644
--- a/tests/metadata/test_explain.py
+++ b/tests/metadata/test_explain.py
@@ -20,12 +20,13 @@
 import re
 
 from tests.common.impala_test_suite import ImpalaTestSuite
-from tests.common.skip import SkipIfLocal, SkipIfNotHdfsMinicluster
+from tests.common.skip import SkipIfLocal, SkipIfNotHdfsMinicluster, SkipIfEC
 from tests.util.filesystem_utils import WAREHOUSE
 
 # Tests the different explain levels [0-3] on a few queries.
 # TODO: Clean up this test to use an explain level test dimension and appropriate
 # result sub-sections for the expected explain plans.
+@SkipIfEC.fix_later
 class TestExplain(ImpalaTestSuite):
   # Value for the num_scanner_threads query option to ensure that the memory estimates of
   # scan nodes are consistent even when run on machines with different numbers of cores.

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_hdfs_caching.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_hdfs_caching.py b/tests/query_test/test_hdfs_caching.py
index c013ed4..f16a4a4 100644
--- a/tests/query_test/test_hdfs_caching.py
+++ b/tests/query_test/test_hdfs_caching.py
@@ -25,7 +25,7 @@ from subprocess import check_call
 from tests.common.environ import specific_build_type_timeout
 from tests.common.impala_cluster import ImpalaCluster
 from tests.common.impala_test_suite import ImpalaTestSuite, LOG
-from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfLocal
+from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfLocal, SkipIfEC
 from tests.common.test_dimensions import create_single_exec_option_dimension
 from tests.util.filesystem_utils import get_fs_path
 from tests.util.shell_util import exec_process
@@ -35,6 +35,7 @@ from tests.util.shell_util import exec_process
 @SkipIfADLS.caching
 @SkipIfIsilon.caching
 @SkipIfLocal.caching
+@SkipIfEC.fix_later
 class TestHdfsCaching(ImpalaTestSuite):
   @classmethod
   def get_workload(self):

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_insert.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_insert.py b/tests/query_test/test_insert.py
index 88aafb2..20fee41 100644
--- a/tests/query_test/test_insert.py
+++ b/tests/query_test/test_insert.py
@@ -22,7 +22,7 @@ import pytest
 from testdata.common import widetable
 from tests.common.impala_cluster import ImpalaCluster
 from tests.common.impala_test_suite import ImpalaTestSuite
-from tests.common.skip import SkipIfLocal, SkipIfNotHdfsMinicluster
+from tests.common.skip import SkipIfEC, SkipIfLocal, SkipIfNotHdfsMinicluster
 from tests.common.test_dimensions import (
     create_exec_option_dimension,
     create_uncompressed_text_dimension)
@@ -112,6 +112,8 @@ class TestInsertQueries(ImpalaTestSuite):
     super(TestInsertQueries, cls).setup_class()
 
   @pytest.mark.execute_serially
+  # Erasure coding doesn't respect memory limit
+  @SkipIfEC.fix_later
   def test_insert(self, vector):
     if (vector.get_value('table_format').file_format == 'parquet'):
       vector.get_value('exec_option')['COMPRESSION_CODEC'] = \

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_insert_parquet.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_insert_parquet.py b/tests/query_test/test_insert_parquet.py
index 1e8ce6e..4af81c9 100644
--- a/tests/query_test/test_insert_parquet.py
+++ b/tests/query_test/test_insert_parquet.py
@@ -29,7 +29,7 @@ from parquet.ttypes import ColumnOrder, SortingColumn, TypeDefinedOrder
 from tests.common.environ import impalad_basedir
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.parametrize import UniqueDatabase
-from tests.common.skip import SkipIfIsilon, SkipIfLocal, SkipIfS3, SkipIfADLS
+from tests.common.skip import SkipIfEC, SkipIfIsilon, SkipIfLocal, SkipIfS3, SkipIfADLS
 from tests.common.test_dimensions import create_exec_option_dimension
 from tests.common.test_vector import ImpalaTestDimension
 from tests.util.filesystem_utils import get_fs_path
@@ -101,6 +101,7 @@ class TestInsertParquetQueries(ImpalaTestSuite):
     cls.ImpalaTestMatrix.add_constraint(
         lambda v: v.get_value('table_format').compression_codec == 'none')
 
+  @SkipIfEC.oom
   @SkipIfLocal.multiple_impalad
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_insert_parquet(self, vector, unique_database):

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_mt_dop.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_mt_dop.py b/tests/query_test/test_mt_dop.py
index b6e08e5..0766f2e 100644
--- a/tests/query_test/test_mt_dop.py
+++ b/tests/query_test/test_mt_dop.py
@@ -22,6 +22,7 @@ import pytest
 from copy import deepcopy
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.kudu_test_suite import KuduTestSuite
+from tests.common.skip import SkipIfEC
 from tests.common.test_vector import ImpalaTestDimension
 
 # COMPUTE STATS on Parquet tables automatically sets MT_DOP=4, so include
@@ -97,6 +98,8 @@ class TestMtDopParquet(ImpalaTestSuite):
     vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')
     self.run_test_case('QueryTest/mt-dop-parquet-nested', vector)
 
+  # Impala scans fewer row groups than it should with erasure coding.
+  @SkipIfEC.fix_later
   def test_parquet_filtering(self, vector):
     """IMPALA-4624: Test that dictionary filtering eliminates row groups correctly."""
     vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_nested_types.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_nested_types.py b/tests/query_test/test_nested_types.py
index e62bf4f..0603745 100644
--- a/tests/query_test/test_nested_types.py
+++ b/tests/query_test/test_nested_types.py
@@ -27,8 +27,10 @@ from tests.common.skip import (
     SkipIfIsilon,
     SkipIfS3,
     SkipIfADLS,
+    SkipIfEC,
     SkipIfLocal,
-    SkipIfNotHdfsMinicluster)
+    SkipIfNotHdfsMinicluster
+    )
 from tests.common.test_vector import ImpalaTestDimension
 from tests.util.filesystem_utils import WAREHOUSE, get_fs_path
 
@@ -86,6 +88,7 @@ class TestNestedTypes(ImpalaTestSuite):
     a 3-node HDFS minicluster."""
     self.run_test_case('QueryTest/nested-types-tpch-mem-limit', vector)
 
+  @SkipIfEC.fix_later
   def test_parquet_stats(self, vector):
     """Queries that test evaluation of Parquet row group statistics."""
     self.run_test_case('QueryTest/nested-types-parquet-stats', vector)

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_queries.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_queries.py b/tests/query_test/test_queries.py
index a4fb91e..90cb392 100644
--- a/tests/query_test/test_queries.py
+++ b/tests/query_test/test_queries.py
@@ -22,6 +22,7 @@ import pytest
 import re
 
 from tests.common.impala_test_suite import ImpalaTestSuite
+from tests.common.skip import SkipIfEC
 from tests.common.test_dimensions import create_uncompressed_text_dimension, extend_exec_option_dimension
 from tests.common.test_vector import ImpalaTestVector
 
@@ -166,12 +167,14 @@ class TestQueriesTextTables(ImpalaTestSuite):
     vector.get_value('exec_option')['num_nodes'] = 1
     self.run_test_case('QueryTest/distinct-estimate', vector)
 
+  @SkipIfEC.oom
   def test_random(self, vector):
     # These results will vary slightly depending on how the values get split up
     # so only run with 1 node and on text.
     vector.get_value('exec_option')['num_nodes'] = 1
     self.run_test_case('QueryTest/random', vector)
 
+  @SkipIfEC.oom
   def test_values(self, vector):
     self.run_test_case('QueryTest/values', vector)
 
@@ -188,6 +191,7 @@ class TestQueriesParquetTables(ImpalaTestSuite):
   def get_workload(cls):
     return 'functional-query'
 
+  @SkipIfEC.oom
   @pytest.mark.execute_serially
   def test_very_large_strings(self, vector):
     """Regression test for IMPALA-1619. Doesn't need to be run on all file formats.
@@ -219,6 +223,7 @@ class TestHdfsQueries(ImpalaTestSuite):
   def get_workload(cls):
     return 'functional-query'
 
+  @SkipIfEC.oom
   def test_hdfs_scan_node(self, vector):
     self.run_test_case('QueryTest/hdfs-scan-node', vector)
 

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_query_mem_limit.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_query_mem_limit.py b/tests/query_test/test_query_mem_limit.py
index 17ea9f5..97d3ae7 100644
--- a/tests/query_test/test_query_mem_limit.py
+++ b/tests/query_test/test_query_mem_limit.py
@@ -24,6 +24,7 @@ from copy import copy
 
 from tests.beeswax.impala_beeswax import ImpalaBeeswaxException
 from tests.common.impala_test_suite import ImpalaTestSuite
+from tests.common.skip import SkipIfEC
 from tests.common.test_dimensions import (
     ImpalaTestDimension,
     create_single_exec_option_dimension,
@@ -87,6 +88,7 @@ class TestQueryMemLimit(ImpalaTestSuite):
     cls.ImpalaTestMatrix.add_constraint(
         lambda v: v.get_value('exec_option')['batch_size'] == 0)
 
+  @SkipIfEC.oom
   @pytest.mark.execute_serially
   def test_mem_limit(self, vector):
     mem_limit = copy(vector.get_value('mem_limit'))

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/query_test/test_scanners.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index 4ded221..61a4862 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -35,6 +35,7 @@ from tests.common.impala_test_suite import ImpalaTestSuite, LOG
 from tests.common.skip import (
     SkipIfS3,
     SkipIfADLS,
+    SkipIfEC,
     SkipIfIsilon,
     SkipIfLocal)
 from tests.common.test_dimensions import (
@@ -485,6 +486,7 @@ class TestParquet(ImpalaTestSuite):
   @SkipIfADLS.hdfs_block_size
   @SkipIfIsilon.hdfs_block_size
   @SkipIfLocal.multiple_impalad
+  @SkipIfEC.fix_later
   def test_misaligned_parquet_row_groups(self, vector):
     """IMPALA-3989: Test that no warnings are issued when misaligned row groups are
     encountered. Make sure that 'NumScannersWithNoReads' counters are set to the number of
@@ -555,6 +557,7 @@ class TestParquet(ImpalaTestSuite):
   @SkipIfADLS.hdfs_block_size
   @SkipIfIsilon.hdfs_block_size
   @SkipIfLocal.multiple_impalad
+  @SkipIfEC.fix_later
   def test_multiple_blocks_one_row_group(self, vector):
     # For IMPALA-1881. The table functional_parquet.lineitem_multiblock_one_row_group has
     # 3 blocks but only one row group across these blocks. We test to see that only one
@@ -954,6 +957,7 @@ class TestOrc(ImpalaTestSuite):
 
   @SkipIfS3.hdfs_block_size
   @SkipIfADLS.hdfs_block_size
+  @SkipIfEC.fix_later
   @SkipIfIsilon.hdfs_block_size
   @SkipIfLocal.multiple_impalad
   def test_misaligned_orc_stripes(self, vector, unique_database):

http://git-wip-us.apache.org/repos/asf/impala/blob/21d92aac/tests/util/filesystem_utils.py
----------------------------------------------------------------------
diff --git a/tests/util/filesystem_utils.py b/tests/util/filesystem_utils.py
index 77112be..82f6584 100644
--- a/tests/util/filesystem_utils.py
+++ b/tests/util/filesystem_utils.py
@@ -30,6 +30,7 @@ IS_ISILON = FILESYSTEM == "isilon"
 IS_LOCAL = FILESYSTEM == "local"
 IS_HDFS = FILESYSTEM == "hdfs"
 IS_ADLS = FILESYSTEM == "adls"
+IS_EC = os.getenv("ERASURE_CODING") == "true"
 # This condition satisfies both the states where one can assume a default fs
 #   - The environment variable is set to an empty string.
 #   - Tne environment variables is unset ( None )


[3/5] impala git commit: IMPALA-7011: Simplify PlanRootSink control logic

Posted by ph...@apache.org.
IMPALA-7011: Simplify PlanRootSink control logic

1) The eos_ and sender_done_ bits really encode three possible states
   that the sender can be in. Make this explicit using an enum with
   three values.

2) The purpose of CloseConsumer() has changed over time and we can clean
   this up now:

 a) Originally, it looks like it was used to unblock the sender when the
   consumer finishes before eos, but also keep the sink alive long
   enough for the coordinator. This is no longer necessary now that
   control structures are owned by the QueryState whose lifetime is
   controlled by a reference count taken by the coordinator. So, we don't
   need the coordinator to tell the sink it's done calling it and we
   don't need the consumer_done_ state.

 b) Later on, CloseConsumer() was used as a cancellation mechinism.
   We need to keep this around (or use timeouts on the condvars) to kick
   both the consumer and producer on cancellation. But let's make the
   cancellation logic similar to the exec nodes and other sinks by
   driving the cancellation using the RuntimeState's cancellation
   flag. Now that CloseConsumer() is only about cancellation, rename it
   to Cancel() (later we may promote it to DataSink and implement in the
   data stream sender as well).

Testing:
- Exhaustive
- Minicluster concurrent_select.py stress

Change-Id: Ifc75617a253fd43a6122baa4b4dc7aeb1dbe633f
Reviewed-on: http://gerrit.cloudera.org:8080/10449
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/482ea391
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/482ea391
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/482ea391

Branch: refs/heads/master
Commit: 482ea3914093064da1f4f176b6c616150100768c
Parents: a5aa6ff
Author: Dan Hecht <dh...@cloudera.com>
Authored: Thu May 17 17:03:54 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Mon May 21 23:50:42 2018 +0000

----------------------------------------------------------------------
 be/src/exec/plan-root-sink.cc             | 39 +++++++-------
 be/src/exec/plan-root-sink.h              | 72 +++++++++++++-------------
 be/src/runtime/coordinator.cc             | 11 +---
 be/src/runtime/fragment-instance-state.cc |  6 +--
 4 files changed, 60 insertions(+), 68 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/482ea391/be/src/exec/plan-root-sink.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/plan-root-sink.cc b/be/src/exec/plan-root-sink.cc
index 836a376..a64dbb9 100644
--- a/be/src/exec/plan-root-sink.cc
+++ b/be/src/exec/plan-root-sink.cc
@@ -72,11 +72,10 @@ Status PlanRootSink::Send(RuntimeState* state, RowBatch* batch) {
   // written clients may not cope correctly with them. See IMPALA-4335.
   while (current_batch_row < batch->num_rows()) {
     unique_lock<mutex> l(lock_);
-    while (results_ == nullptr && !consumer_done_) sender_cv_.Wait(l);
-    if (consumer_done_ || batch == nullptr) {
-      eos_ = true;
-      return Status::OK();
-    }
+    // Wait until the consumer gives us a result set to fill in, or the fragment
+    // instance has been cancelled.
+    while (results_ == nullptr && !state->is_cancelled()) sender_cv_.Wait(l);
+    RETURN_IF_CANCELLED(state);
 
     // Otherwise the consumer is ready. Fill out the rows.
     DCHECK(results_ != nullptr);
@@ -107,29 +106,26 @@ Status PlanRootSink::Send(RuntimeState* state, RowBatch* batch) {
 
 Status PlanRootSink::FlushFinal(RuntimeState* state) {
   unique_lock<mutex> l(lock_);
-  sender_done_ = true;
-  eos_ = true;
+  sender_state_ = SenderState::EOS;
   consumer_cv_.NotifyAll();
   return Status::OK();
 }
 
 void PlanRootSink::Close(RuntimeState* state) {
   unique_lock<mutex> l(lock_);
-  // No guarantee that FlushFinal() has been called, so need to mark sender_done_ here as
-  // well.
-  // TODO: shouldn't this also set eos to true? do we need both eos and sender_done_?
-  sender_done_ = true;
+  // FlushFinal() won't have been called when the fragment instance encounters an error
+  // before sending all rows.
+  if (sender_state_ == SenderState::ROWS_PENDING) {
+    sender_state_ = SenderState::CLOSED_NOT_EOS;
+  }
   consumer_cv_.NotifyAll();
-  // Wait for consumer to be done, in case sender tries to tear-down this sink while the
-  // sender is still reading from it.
-  while (!consumer_done_) sender_cv_.Wait(l);
   DataSink::Close(state);
 }
 
-void PlanRootSink::CloseConsumer() {
-  unique_lock<mutex> l(lock_);
-  consumer_done_ = true;
+void PlanRootSink::Cancel(RuntimeState* state) {
+  DCHECK(state->is_cancelled());
   sender_cv_.NotifyAll();
+  consumer_cv_.NotifyAll();
 }
 
 Status PlanRootSink::GetNext(
@@ -140,9 +136,14 @@ Status PlanRootSink::GetNext(
   num_rows_requested_ = num_results;
   sender_cv_.NotifyAll();
 
-  while (!eos_ && results_ != nullptr && !sender_done_) consumer_cv_.Wait(l);
+  // Wait while the sender is still producing rows and hasn't filled in the current
+  // result set.
+  while (sender_state_ == SenderState::ROWS_PENDING && results_ != nullptr &&
+      !state->is_cancelled()) {
+    consumer_cv_.Wait(l);
+  }
 
-  *eos = eos_;
+  *eos = sender_state_ == SenderState::EOS;
   return state->GetQueryStatus();
 }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/482ea391/be/src/exec/plan-root-sink.h
----------------------------------------------------------------------
diff --git a/be/src/exec/plan-root-sink.h b/be/src/exec/plan-root-sink.h
index 87ab3ef..1d64b21 100644
--- a/be/src/exec/plan-root-sink.h
+++ b/be/src/exec/plan-root-sink.h
@@ -36,19 +36,25 @@ class ScalarExprEvaluator;
 /// The consumer calls GetNext() with a QueryResultSet and a requested fetch
 /// size. GetNext() shares these fields with Send(), and then signals Send() to begin
 /// populating the result set. GetNext() returns when a) the sender has sent all of its
-/// rows b) the requested fetch size has been satisfied or c) the sender calls Close().
+/// rows b) the requested fetch size has been satisfied or c) the sender fragment
+/// instance was cancelled.
 ///
-/// Send() fills in as many rows as are requested from the current batch. When the batch
-/// is exhausted - which may take several calls to GetNext() - control is returned to the
-/// sender to produce another row batch.
+/// The sender uses Send() to fill in as many rows as are requested from the current
+/// batch. When the batch is exhausted - which may take several calls to GetNext() -
+/// Send() returns so that the fragment instance can produce another row batch.
 ///
-/// When the consumer is finished, CloseConsumer() must be called to allow the sender to
-/// exit Send(). Senders must call Close() to signal to the consumer that no more batches
-/// will be produced. CloseConsumer() may be called concurrently with GetNext(). Senders
-/// should ensure that the consumer is not blocked in GetNext() before destroying the
-/// PlanRootSink.
+/// FlushFinal() should be called by the sender to signal it has finished calling
+/// Send() for all rows. Close() should be called by the sender to release resources.
 ///
-/// The sink is thread safe up to a single producer and single consumer.
+/// When the fragment instance is cancelled, Cancel() is called to unblock both the
+/// sender and consumer. Cancel() may be called concurrently with Send(), GetNext() and
+/// Close().
+///
+/// The sink is thread safe up to a single sender and single consumer.
+///
+/// Lifetime: The sink is owned by the QueryState and has the same lifetime as
+/// QueryState. The QueryState references from the fragment instance and the Coordinator
+/// ensures that this outlives any calls to Send() and GetNext(), respectively.
 ///
 /// TODO: The consumer drives the sender in lock-step with GetNext() calls, forcing a
 /// context-switch on every invocation. Measure the impact of this, and consider moving to
@@ -62,25 +68,23 @@ class PlanRootSink : public DataSink {
   /// consumer has consumed 'batch' by calling GetNext().
   virtual Status Send(RuntimeState* state, RowBatch* batch);
 
-  /// Sets eos and notifies consumer.
+  /// Indicates eos and notifies consumer.
   virtual Status FlushFinal(RuntimeState* state);
 
-  /// To be called by sender only. Signals to the consumer that no more batches will be
-  /// produced, then blocks until someone calls CloseConsumer().
+  /// To be called by sender only. Release resources and unblocks consumer.
   virtual void Close(RuntimeState* state);
 
-  /// Populates 'result_set' with up to 'num_rows' rows produced by the fragment instance
-  /// that calls Send(). *eos is set to 'true' when there are no more rows to consume. If
-  /// CloseConsumer() is called concurrently, GetNext() will return and may not populate
-  /// 'result_set'. All subsequent calls after CloseConsumer() will do no work.
+  /// To be called by the consumer only. 'result_set' with up to 'num_rows' rows
+  /// produced by the fragment instance that calls Send(). *eos is set to 'true' when
+  /// there are no more rows to consume. If Cancel() or Close() are called concurrently,
+  /// GetNext() will return and may not populate 'result_set'. All subsequent calls
+  /// after Cancel() or Close() are no-ops.
   Status GetNext(
       RuntimeState* state, QueryResultSet* result_set, int num_rows, bool* eos);
 
-  /// Signals to the producer that the sink will no longer be used. GetNext() may be
-  /// safely called after this returns (it does nothing), but consumers should consider
-  /// that the PlanRootSink may be undergoing destruction. May be called more than once;
-  /// only the first call has any effect.
-  void CloseConsumer();
+  /// Unblocks both the consumer and sender so they can check the cancellation flag in
+  /// the RuntimeState. The cancellation flag should be set prior to calling this.
+  void Cancel(RuntimeState* state);
 
   static const std::string NAME;
 
@@ -90,21 +94,22 @@ class PlanRootSink : public DataSink {
 
   /// Waited on by the sender only. Signalled when the consumer has written results_ and
   /// num_rows_requested_, and so the sender may begin satisfying that request for rows
-  /// from its current batch. Also signalled when CloseConsumer() is called, to unblock
-  /// the sender.
+  /// from its current batch. Also signalled when Cancel() is called, to unblock the
+  /// sender.
   ConditionVariable sender_cv_;
 
   /// Waited on by the consumer only. Signalled when the sender has finished serving a
-  /// request for rows. Also signalled by Close() and FlushFinal() to signal to the
-  /// consumer that no more rows are coming.
+  /// request for rows. Also signalled by FlushFinal(), Close() and Cancel() to unblock
+  /// the consumer.
   ConditionVariable consumer_cv_;
 
-  /// Signals to producer that the consumer is done, and the sink may be torn down.
-  bool consumer_done_ = false;
-
-  /// Signals to consumer that the sender is done, and that there are no more row batches
-  /// to consume.
-  bool sender_done_ = false;
+  /// State of the sender:
+  /// - ROWS_PENDING: the sender is still producing rows; the only non-terminal state
+  /// - EOS: the sender has passed all rows to Send()
+  /// - CLOSED_NOT_EOS: the sender (i.e. sink) was closed before all rows were passed to
+  ///   Send()
+  enum class SenderState { ROWS_PENDING, EOS, CLOSED_NOT_EOS };
+  SenderState sender_state_ = SenderState::ROWS_PENDING;
 
   /// The current result set passed to GetNext(), to fill in Send(). Not owned by this
   /// sink. Reset to nullptr after Send() completes the request to signal to the consumer
@@ -114,9 +119,6 @@ class PlanRootSink : public DataSink {
   /// Set by GetNext() to indicate to Send() how many rows it should write to results_.
   int num_rows_requested_ = 0;
 
-  /// Set to true in Send() and FlushFinal() when the Sink() has finished producing rows.
-  bool eos_ = false;
-
   /// Writes a single row into 'result' and 'scales' by evaluating
   /// output_expr_evals_ over 'row'.
   void GetRowValue(TupleRow* row, std::vector<void*>* result, std::vector<int>* scales);

http://git-wip-us.apache.org/repos/asf/impala/blob/482ea391/be/src/runtime/coordinator.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index 1b8dd83..998fee2 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -148,14 +148,8 @@ Status Coordinator::Exec() {
       DCHECK(!prepare_status.ok());
       return UpdateExecState(prepare_status, nullptr, FLAGS_hostname);
     }
-
-    // When GetFInstanceState() returns the coordinator instance, the Prepare phase
-    // is done and the FragmentInstanceState's root sink will be set up. At that point,
-    // the coordinator must be sure to call root_sink()->CloseConsumer(); the
-    // fragment instance's executor will not complete until that point.
-    // TODO: what does this mean?
-    // TODO: Consider moving this to Wait().
-    // TODO: clarify need for synchronization on this event
+    // When GetFInstanceState() returns the coordinator instance, the Prepare phase is
+    // done and the FragmentInstanceState's root sink will be set up.
     DCHECK(coord_instance_->IsPrepared() && coord_instance_->WaitForPrepare().ok());
     coord_sink_ = coord_instance_->root_sink();
     DCHECK(coord_sink_ != nullptr);
@@ -527,7 +521,6 @@ void Coordinator::HandleExecStateTransition(
       exec_rpcs_complete_barrier_->pending() <= 0) << "exec rpcs not completed";
 
   query_events_->MarkEvent(exec_state_to_event.at(new_state));
-  if (coord_sink_ != nullptr) coord_sink_->CloseConsumer();
   // This thread won the race to transitioning into a terminal state - terminate
   // execution and release resources.
   ReleaseExecResources();

http://git-wip-us.apache.org/repos/asf/impala/blob/482ea391/be/src/runtime/fragment-instance-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index a14bf31..c61cb81 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -103,13 +103,9 @@ void FragmentInstanceState::Cancel() {
   // being cancelled.
   discard_result(WaitForPrepare());
 
-  // Ensure that the sink is closed from both sides. Although in ordinary executions we
-  // rely on the consumer to do this, in error cases the consumer may not be able to send
-  // CloseConsumer() (see IMPALA-4348 for an example).
-  if (root_sink_ != nullptr) root_sink_->CloseConsumer();
-
   DCHECK(runtime_state_ != nullptr);
   runtime_state_->set_is_cancelled();
+  if (root_sink_ != nullptr) root_sink_->Cancel(runtime_state_);
   runtime_state_->stream_mgr()->Cancel(runtime_state_->fragment_instance_id());
 }
 


[2/5] impala git commit: [DOCS] Fixed misleading documentation on Impala + HDFS caching

Posted by ph...@apache.org.
[DOCS] Fixed misleading documentation on Impala + HDFS caching

Change-Id: I63cd1ff7b885a094a4a3e91c31101d25414b4db7
Reviewed-on: http://gerrit.cloudera.org:8080/10454
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a5aa6ffd
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a5aa6ffd
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a5aa6ffd

Branch: refs/heads/master
Commit: a5aa6ffdaf850a2efe872a8ec9d648bfdf0c4cd2
Parents: 7485d60
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri May 18 14:32:02 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Mon May 21 23:41:49 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_perf_hdfs_caching.xml | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/a5aa6ffd/docs/topics/impala_perf_hdfs_caching.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_hdfs_caching.xml b/docs/topics/impala_perf_hdfs_caching.xml
index e1f25a6..8f0fbb9 100644
--- a/docs/topics/impala_perf_hdfs_caching.xml
+++ b/docs/topics/impala_perf_hdfs_caching.xml
@@ -270,14 +270,11 @@ show table stats census;
         location, dropping the table, and so on.
       </p>
 
-      <p>
-        When data is requested to be pinned in memory, that process happens in the background without blocking
-        access to the data while the caching is in progress. Loading the data from disk could take some time.
-        Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
-        been pinned yet. When files are added to a table or partition whose contents are cached, Impala
-        automatically detects those changes and performs a <codeph>REFRESH</codeph> automatically once the relevant
-        data is cached.
-      </p>
+      <p> When data is requested to be pinned in memory, that process happens in
+        the background without blocking access to the data while the caching is
+        in progress. Loading the data from disk could take some time. Impala
+        reads each HDFS data block from memory if it has been pinned already, or
+        from disk if it has not been pinned yet.</p>
 
       <p>
         The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota


[5/5] impala git commit: IMPALA-7051: Serialize Maven invocations.

Posted by ph...@apache.org.
IMPALA-7051: Serialize Maven invocations.

I've observed some rare cases where Impala fails to build. I believe
it's because two Maven targets (yarn-extras and ext-data-source) are
being executed simultaneously. Maven's handling of ~/.m2/repository,
for example, is known to be not safe.

This patch serializes the Maven builds with the following
dependency graph:
  fe -> yarn-extras -> ext-data-source -> impala-parent
The ordering of yarn-extras -> ext-data-source is arbitrary.

I decided that this artificial dependency was the clearest
way to prevent parallel executions. Having mvn-quiet.sh
take a lock seemed considerably more complex.

Change-Id: Ie24f34f421bc7dcf9140938464d43400da95275e
Reviewed-on: http://gerrit.cloudera.org:8080/10460
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/23e11dc7
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/23e11dc7
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/23e11dc7

Branch: refs/heads/master
Commit: 23e11dc72662417059b1b7337d69e78c2ac4ba65
Parents: 21d92aa
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Fri May 18 16:36:58 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 22 02:23:26 2018 +0000

----------------------------------------------------------------------
 common/yarn-extras/CMakeLists.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/23e11dc7/common/yarn-extras/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/common/yarn-extras/CMakeLists.txt b/common/yarn-extras/CMakeLists.txt
index 2b5f005..4f46ba5 100644
--- a/common/yarn-extras/CMakeLists.txt
+++ b/common/yarn-extras/CMakeLists.txt
@@ -15,6 +15,9 @@
 # specific language governing permissions and limitations
 # under the License.
 
-add_custom_target(yarn-extras ALL DEPENDS impala-parent
+# The dependency on ext-data-source here is fictional, but Maven does not like
+# concurrent invocations. These lead to opaque, non-deterministic errors due to
+# races in how Maven handles its ~/.m2/repository doc.
+add_custom_target(yarn-extras ALL DEPENDS impala-parent ext-data-source
   COMMAND $ENV{IMPALA_HOME}/bin/mvn-quiet.sh -B install -DskipTests
 )