You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by mi...@apache.org on 2022/11/21 18:52:25 UTC

[impala] branch master updated (c3ec9272c -> f8443d982)

This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from c3ec9272c IMPALA-11724: Use CDP Ozone in test environment
     new 87e007725 IMPALA-11734: TestIcebergTable.test_compute_stats fails in RELEASE builds
     new f8443d982 IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../queries/QueryTest/iceberg-compute-stats.test   |  5 ++-
 .../functional-query/queries/QueryTest/sfs.test    | 42 +++++++++++-----------
 tests/common/skip.py                               | 10 +++++-
 tests/custom_cluster/test_disable_features.py      |  4 +--
 tests/custom_cluster/test_hedged_reads.py          |  3 +-
 tests/custom_cluster/test_scratch_disk.py          | 12 +++----
 tests/data_errors/test_data_errors.py              |  9 +++--
 tests/metadata/test_ddl.py                         |  2 +-
 tests/query_test/test_acid.py                      |  4 +--
 tests/query_test/test_iceberg.py                   | 18 +++++-----
 tests/query_test/test_sfs.py                       |  7 ++--
 tests/stress/test_acid_stress.py                   |  2 +-
 tests/stress/test_insert_stress.py                 |  5 +--
 13 files changed, 72 insertions(+), 51 deletions(-)

[impala] 02/02: IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone

Posted by mi...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f8443d982891e3fd81e84c58394777580c9a5ea2
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Wed Nov 16 14:13:54 2022 -0800

    IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone
    
    Convert SkipIf.not_hdfs to SkipIf.not_dfs for tests that require
    filesystem semantics, adding more feature test coverage with Ozone.
    
    Creates a separate not_scratch_fs flag for scratch dir tests as they're
    not supported with Ozone yet. Filed IMPALA-11730 to address this.
    
    Preserves not_hdfs for a specific test that uses the dfsadmin CLI to put
    it in safemode.
    
    Adds sfs_ofs_unsupported for SmallFileSystem tests. This should work for
    many of our filesystems based on
    https://github.com/apache/hive/blob/ebb1e2fa9914bcccecad261d53338933b699ccb1/ql/src/java/org/apache/hadoop/hive/ql/io/SingleFileSystem.java#L62-L87. Makes sfs tests work on S3.
    
    Adds hardcoded_uris for IcebergV2 tests where deletes are implemented as
    hardcoded URIs in parquet files. Adding a parquet read/write library for
    Python is beyond the scope if this patch.
    
    Change-Id: Iafc1dac52d013e74a459fdc4336c26891a256ef1
    Reviewed-on: http://gerrit.cloudera.org:8080/19254
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
 .../functional-query/queries/QueryTest/sfs.test    | 42 +++++++++++-----------
 tests/common/skip.py                               | 10 +++++-
 tests/custom_cluster/test_disable_features.py      |  4 +--
 tests/custom_cluster/test_hedged_reads.py          |  3 +-
 tests/custom_cluster/test_scratch_disk.py          | 12 +++----
 tests/data_errors/test_data_errors.py              |  9 +++--
 tests/metadata/test_ddl.py                         |  2 +-
 tests/query_test/test_acid.py                      |  4 +--
 tests/query_test/test_iceberg.py                   | 18 +++++-----
 tests/query_test/test_sfs.py                       |  7 ++--
 tests/stress/test_acid_stress.py                   |  2 +-
 tests/stress/test_insert_stress.py                 |  5 +--
 12 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/testdata/workloads/functional-query/queries/QueryTest/sfs.test b/testdata/workloads/functional-query/queries/QueryTest/sfs.test
index ec03e5d26..03cf028e2 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/sfs.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/sfs.test
@@ -3,7 +3,7 @@
 # We do not hardcode the host name to something like "localhost" since the host name may
 # be an IP address in a test environment.
 CREATE EXTERNAL TABLE test_tbl_01 (s STRING, i INT) STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d1.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d1.parq/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -11,7 +11,7 @@ LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs
 CREATE EXTERNAL TABLE test_tbl_02 (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d2.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d2.txt/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -48,22 +48,22 @@ INSERT INTO TABLE test_tbl_02 VALUES ('x', 100);
 row_regex: .*Unable to INSERT into target table .+ because .+ is not a supported filesystem.*
 ====
 ---- QUERY
-LOAD DATA INPATH 'hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d3.parq' INTO TABLE test_tbl_01
+LOAD DATA INPATH '$NAMENODE/test-warehouse/$DATABASE.db/sfs_d3.parq' INTO TABLE test_tbl_01
 ---- CATCH
 Unsupported SFS filesystem operation!
 ====
 ---- QUERY
-LOAD DATA INPATH 'hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d4.txt' INTO TABLE test_tbl_02
+LOAD DATA INPATH '$NAMENODE/test-warehouse/$DATABASE.db/sfs_d4.txt' INTO TABLE test_tbl_02
 ---- CATCH
 Unsupported SFS filesystem operation!
 ====
 ---- QUERY
-LOAD DATA INPATH 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#' INTO TABLE test_tbl_01
+LOAD DATA INPATH 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#' INTO TABLE test_tbl_01
 ---- CATCH
 row_regex: .*INPATH location .+ must point to one of the supported filesystem URI scheme.*
 ====
 ---- QUERY
-LOAD DATA INPATH 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#' INTO TABLE test_tbl_02
+LOAD DATA INPATH 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#' INTO TABLE test_tbl_02
 ---- CATCH
 row_regex: .*INPATH location .+ must point to one of the supported filesystem URI scheme.*
 ====
@@ -89,7 +89,7 @@ COMPUTE STATS $DATABASE.test_tbl_02
 ====
 ---- QUERY
 CREATE EXTERNAL TABLE test_tbl_03_ext (s STRING, i INT) STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -97,7 +97,7 @@ LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABAS
 CREATE EXTERNAL TABLE test_tbl_04_ext (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -163,7 +163,7 @@ DROP TABLE test_tbl_04_ext;
 # The table can actually be created.
 CREATE TABLE test_tbl_03 (s STRING, i INT)
 STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -172,7 +172,7 @@ LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABAS
 CREATE TABLE test_tbl_04 (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -242,7 +242,7 @@ DROP TABLE test_tbl_04
 # test_tbl_05 can be created, which shows that sfs_d3.parq has not been deleted after
 # test_tbl_03 was dropped.
 CREATE TABLE test_tbl_05 (s STRING, i INT) STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -252,7 +252,7 @@ LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABAS
 CREATE TABLE test_tbl_06 (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -286,7 +286,7 @@ STRING, INT
 SET DEFAULT_TRANSACTIONAL_TYPE=INSERT_ONLY;
 CREATE TABLE test_tbl_03 (s STRING, i INT)
 STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
 ---- CATCH
 A managed table's location should be located within managed warehouse root directory or within its database's managedLocationUri.
 ====
@@ -297,7 +297,7 @@ SET DEFAULT_TRANSACTIONAL_TYPE=INSERT_ONLY;
 CREATE TABLE test_tbl_04 (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
 ---- CATCH
 A managed table's location should be located within managed warehouse root directory or within its database's managedLocationUri.
 ====
@@ -305,23 +305,25 @@ A managed table's location should be located within managed warehouse root direc
 SET DEFAULT_TRANSACTIONAL_TYPE=INSERT_ONLY;
 CREATE TABLE test_tbl_03 (s STRING, i INT)
 STORED AS PARQUET
-LOCATION 'hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d3.parq'
----- CATCH
+LOCATION '$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d3.parq'
+---- CATCH: ANY_OF
 Path is not a directory
+Path is a file
 ====
 ---- QUERY
 SET DEFAULT_TRANSACTIONAL_TYPE=INSERT_ONLY;
 CREATE TABLE test_tbl_04 (s STRING, i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/managed/$DATABASE.db/sfs_d4.txt'
----- CATCH
+LOCATION '$NAMENODE/test-warehouse/managed/$DATABASE.db/sfs_d4.txt'
+---- CATCH: ANY_OF
 Path is not a directory
+Path is a file
 ====
 ---- QUERY
 # The table can actually be created but the contents of the table cannot be retrieved.
 CREATE EXTERNAL TABLE test_tbl_03 (s STRING) PARTITIONED BY (i INT) STORED AS PARQUET
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d3.parq/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
@@ -330,7 +332,7 @@ LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs
 CREATE EXTERNAL TABLE test_tbl_04 (s STRING) PARTITIONED BY (i INT)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
-LOCATION 'sfs+hdfs://$INTERNAL_LISTEN_HOST:20500/test-warehouse/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
+LOCATION 'sfs+$NAMENODE/test-warehouse/$DATABASE.db/sfs_d4.txt/#SINGLEFILE#'
 ---- RESULTS
 'Table has been created.'
 ====
diff --git a/tests/common/skip.py b/tests/common/skip.py
index 6ed5b477d..e5b856485 100644
--- a/tests/common/skip.py
+++ b/tests/common/skip.py
@@ -98,7 +98,15 @@ class SkipIf:
   skip_hbase = pytest.mark.skipif(pytest.config.option.skip_hbase,
       reason="--skip_hbase argument specified")
   not_s3 = pytest.mark.skipif(not IS_S3, reason="S3 Filesystem needed")
-  not_hdfs = pytest.mark.skipif(not IS_HDFS, reason="HDFS Filesystem needed")
+  not_hdfs = pytest.mark.skipif(not IS_HDFS, reason="HDFS admin needed")
+  not_dfs = pytest.mark.skipif(not (IS_HDFS or IS_OZONE),
+      reason="HDFS/Ozone Filesystem needed")
+  not_scratch_fs = pytest.mark.skipif(not IS_HDFS,
+      reason="Scratch dirs for temporary file spilling not supported")
+  sfs_unsupported = pytest.mark.skipif(not (IS_HDFS or IS_S3 or IS_ABFS or IS_ADLS
+      or IS_GCS), reason="Hive support for sfs+ is limited, HIVE-26757")
+  hardcoded_uris = pytest.mark.skipif(not IS_HDFS,
+      reason="Iceberg delete files hardcode the full URI in parquet files")
   not_ec = pytest.mark.skipif(not IS_EC, reason="Erasure Coding needed")
   no_secondary_fs = pytest.mark.skipif(not SECONDARY_FILESYSTEM,
       reason="Secondary filesystem needed")
diff --git a/tests/custom_cluster/test_disable_features.py b/tests/custom_cluster/test_disable_features.py
index 5ce270125..632322301 100644
--- a/tests/custom_cluster/test_disable_features.py
+++ b/tests/custom_cluster/test_disable_features.py
@@ -19,7 +19,7 @@ import pytest
 
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
 from tests.common.parametrize import UniqueDatabase
-from tests.common.skip import SkipIf
+from tests.common.skip import SkipIfFS
 
 
 class TestDisableFeatures(CustomClusterTestSuite):
@@ -29,7 +29,7 @@ class TestDisableFeatures(CustomClusterTestSuite):
   def get_workload(self):
     return 'functional-query'
 
-  @SkipIf.not_hdfs
+  @SkipIfFS.hdfs_caching
   @pytest.mark.execute_serially
   @UniqueDatabase.parametrize(sync_ddl=True)
   @CustomClusterTestSuite.with_args(
diff --git a/tests/custom_cluster/test_hedged_reads.py b/tests/custom_cluster/test_hedged_reads.py
index b24fd924a..e1d36e73b 100644
--- a/tests/custom_cluster/test_hedged_reads.py
+++ b/tests/custom_cluster/test_hedged_reads.py
@@ -19,7 +19,8 @@ import pytest
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
 from tests.common.skip import SkipIf
 
-@SkipIf.not_hdfs
+
+@SkipIf.not_dfs
 class TestHedgedReads(CustomClusterTestSuite):
   """ Exercises the hedged reads code path.
       NOTE: We unfortunately cannot force hedged reads on a minicluster, but we enable
diff --git a/tests/custom_cluster/test_scratch_disk.py b/tests/custom_cluster/test_scratch_disk.py
index a5ca75bbe..66492bcf7 100644
--- a/tests/custom_cluster/test_scratch_disk.py
+++ b/tests/custom_cluster/test_scratch_disk.py
@@ -277,7 +277,7 @@ class TestScratchDir(CustomClusterTestSuite):
     client.close()
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_remote_spill(self, vector):
     # Test one remote directory with one its local buffer directory.
     normal_dirs = self.generate_dirs(1)
@@ -305,7 +305,7 @@ class TestScratchDir(CustomClusterTestSuite):
     client.close()
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_mix_local_and_remote_dir_spill_local_only(self, vector):
     '''Two local directories, the first one is always used as local buffer for
        remote directories. Set the second directory big enough so that only spills
@@ -338,7 +338,7 @@ class TestScratchDir(CustomClusterTestSuite):
     client.close()
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_mix_local_and_remote_dir_spill_both(self, vector):
     '''Two local directories, the first one is always used as local buffer for
        remote directories. Set the second directory small enough so that it spills
@@ -372,7 +372,7 @@ class TestScratchDir(CustomClusterTestSuite):
     client.close()
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_remote_spill_with_options(self, vector):
     # One local buffer directory and one remote directory.
     normal_dirs = self.generate_dirs(1)
@@ -402,7 +402,7 @@ class TestScratchDir(CustomClusterTestSuite):
     client.close()
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_remote_spill_concurrent(self, vector):
     '''Concurrently execute multiple queries that trigger the spilling to the remote
     directory to test if there is a deadlock issue.'''
@@ -449,7 +449,7 @@ class TestScratchDir(CustomClusterTestSuite):
     assert (total_size > 0 and total_size % (8 * 1024 * 1024) == 0)
 
   @pytest.mark.execute_serially
-  @SkipIf.not_hdfs
+  @SkipIf.not_scratch_fs
   def test_scratch_dirs_batch_reading(self, vector):
     # Set the buffer directory small enough to spill to the remote one.
     normal_dirs = self.generate_dirs(1)
diff --git a/tests/data_errors/test_data_errors.py b/tests/data_errors/test_data_errors.py
index 562eef2bf..d098dfed0 100644
--- a/tests/data_errors/test_data_errors.py
+++ b/tests/data_errors/test_data_errors.py
@@ -26,6 +26,8 @@ from tests.beeswax.impala_beeswax import ImpalaBeeswaxException
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import SkipIf, SkipIfFS
 from tests.common.test_dimensions import create_exec_option_dimension
+from tests.util.filesystem_utils import get_fs_path
+
 
 class TestDataErrors(ImpalaTestSuite):
   # batch_size of 1 can expose some interesting corner cases at row batch boundaries.
@@ -42,12 +44,13 @@ class TestDataErrors(ImpalaTestSuite):
   def get_workload(self):
     return 'functional-query'
 
+
 # Regression test for IMP-633. Added as a part of IMPALA-5198.
-@SkipIf.not_hdfs
+@SkipIf.not_dfs
 class TestHdfsFileOpenFailErrors(ImpalaTestSuite):
   @pytest.mark.execute_serially
   def test_hdfs_file_open_fail(self):
-    absolute_location = "/test-warehouse/file_open_fail"
+    absolute_location = get_fs_path("/test-warehouse/file_open_fail")
     create_stmt = \
         "create table file_open_fail (x int) location '" + absolute_location + "'"
     insert_stmt = "insert into file_open_fail values(1)"
@@ -64,6 +67,7 @@ class TestHdfsFileOpenFailErrors(ImpalaTestSuite):
       assert "Failed to open HDFS file" in str(e)
     self.client.execute(drop_stmt)
 
+
 # Test for IMPALA-5331 to verify that the libHDFS API hdfsGetLastExceptionRootCause()
 # works.
 @SkipIf.not_hdfs
@@ -161,6 +165,7 @@ class TestAvroErrors(TestDataErrors):
     vector.get_value('exec_option')['abort_on_error'] = 0
     self.run_test_case('DataErrorsTest/avro-errors', vector)
 
+
 class TestHBaseDataErrors(TestDataErrors):
   @classmethod
   def add_test_dimensions(cls):
diff --git a/tests/metadata/test_ddl.py b/tests/metadata/test_ddl.py
index ba590f8b5..303a258c8 100644
--- a/tests/metadata/test_ddl.py
+++ b/tests/metadata/test_ddl.py
@@ -455,7 +455,7 @@ class TestDdlStatements(TestDdlBase):
     self.run_test_case('QueryTest/alter-table', vector, use_db=unique_database,
         multiple_impalad=self._use_multiple_impalad(vector))
 
-  @SkipIf.not_hdfs
+  @SkipIfFS.hdfs_caching
   @SkipIfLocal.hdfs_client
   @UniqueDatabase.parametrize(sync_ddl=True, num_dbs=2)
   def test_alter_table_hdfs_caching(self, vector, unique_database):
diff --git a/tests/query_test/test_acid.py b/tests/query_test/test_acid.py
index d3d3ed17f..2b09d830a 100644
--- a/tests/query_test/test_acid.py
+++ b/tests/query_test/test_acid.py
@@ -304,7 +304,7 @@ class TestAcid(ImpalaTestSuite):
     assert len(self.execute_query("select * from {}".format(tbl_name)).data) == 0
 
   @SkipIfHive2.acid
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   def test_full_acid_schema_without_file_metadata_tag(self, vector, unique_database):
     """IMPALA-10115: Some files have full ACID schema without having
     'hive.acid.version' set. We still need to identify such files as full ACID"""
@@ -315,7 +315,7 @@ class TestAcid(ImpalaTestSuite):
     table_uri = self._get_table_location(fq_table_name, vector)
     acid_file = (os.environ['IMPALA_HOME'] +
         "/testdata/data/full_acid_schema_but_no_acid_version.orc")
-    self.hdfs_client.copy_from_local(acid_file, table_uri + "/bucket_00000")
+    self.filesystem_client.copy_from_local(acid_file, table_uri + "/bucket_00000")
     self.execute_query("refresh {}".format(fq_table_name))
     result = self.execute_query("select count(*) from {0}".format(fq_table_name))
     assert "3" in result.data
diff --git a/tests/query_test/test_iceberg.py b/tests/query_test/test_iceberg.py
index f264fd5e5..c737b9fde 100644
--- a/tests/query_test/test_iceberg.py
+++ b/tests/query_test/test_iceberg.py
@@ -126,17 +126,17 @@ class TestIcebergTable(IcebergTestSuite):
   # trigger a known bug: IMPALA-11509. Hence, turning this test off until there is a fix
   # for this issue. Note, we could add a sleep righ after table creation that could
   # workaround the above mentioned bug but then we would hit another issue: IMPALA-11502.
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   def test_drop_incomplete_table(self, vector, unique_database):
     """Test DROP TABLE when the underlying directory is deleted. In that case table
     loading fails, but we should be still able to drop the table from Impala."""
-    pytest.skip()
+    pytest.skip("Gets into a metadata update loop")
     tbl_name = unique_database + ".synchronized_iceberg_tbl"
-    cat_location = "/test-warehouse/" + unique_database
+    cat_location = get_fs_path("/test-warehouse/" + unique_database)
     self.client.execute("""create table {0} (i int) stored as iceberg
         tblproperties('iceberg.catalog'='hadoop.catalog',
                       'iceberg.catalog_location'='{1}')""".format(tbl_name, cat_location))
-    self.hdfs_client.delete_file_dir(cat_location, True)
+    self.filesystem_client.delete_file_dir(cat_location, True)
     self.execute_query_expect_success(self.client, """drop table {0}""".format(tbl_name))
 
   def test_insert(self, vector, unique_database):
@@ -455,7 +455,7 @@ class TestIcebergTable(IcebergTestSuite):
       except Exception as e:
         assert "Cannot find a snapshot older than" in str(e)
 
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   def test_strings_utf8(self, vector, unique_database):
     # Create table
     table_name = "ice_str_utf8"
@@ -542,7 +542,7 @@ class TestIcebergTable(IcebergTestSuite):
         os.remove(local_path)
     return datafiles
 
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   def test_writing_metrics_to_metadata(self, vector, unique_database):
     # Create table
     table_name = "ice_stats"
@@ -872,17 +872,17 @@ class TestIcebergV2Table(IcebergTestSuite):
   # the data files via full URI, i.e. they start with 'hdfs://localhost:2050/...'. In the
   # dockerised environment the namenode is accessible on a different hostname/port.
   @SkipIfDockerizedCluster.internal_hostname
-  @SkipIf.not_hdfs
+  @SkipIf.hardcoded_uris
   def test_read_position_deletes(self, vector):
     self.run_test_case('QueryTest/iceberg-v2-read-position-deletes', vector)
 
   @SkipIfDockerizedCluster.internal_hostname
-  @SkipIf.not_hdfs
+  @SkipIf.hardcoded_uris
   def test_read_position_deletes_orc(self, vector):
     self.run_test_case('QueryTest/iceberg-v2-read-position-deletes-orc', vector)
 
   @SkipIfDockerizedCluster.internal_hostname
-  @SkipIf.not_hdfs
+  @SkipIf.hardcoded_uris
   def test_table_sampling_v2(self, vector):
     self.run_test_case('QueryTest/iceberg-tablesample-v2', vector,
         use_db="functional_parquet")
diff --git a/tests/query_test/test_sfs.py b/tests/query_test/test_sfs.py
index 3973a1c4c..5db318e21 100644
--- a/tests/query_test/test_sfs.py
+++ b/tests/query_test/test_sfs.py
@@ -21,8 +21,10 @@
 from tests.common.file_utils import copy_files_to_hdfs_dir
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import SkipIf
+from tests.util.filesystem_utils import WAREHOUSE
 
 
+@SkipIf.sfs_unsupported
 class TestSFS(ImpalaTestSuite):
   @classmethod
   def get_workload(cls):
@@ -37,14 +39,13 @@ class TestSFS(ImpalaTestSuite):
     cls.ImpalaTestMatrix.add_constraint(lambda v:
         v.get_value('exec_option')['disable_codegen'] is False)
 
-  @SkipIf.not_hdfs
   def test_sfs(self, vector, unique_database):
     files_for_external_tables = ["testdata/data/sfs_d1.parq", "testdata/data/sfs_d2.txt",
                                  "testdata/data/sfs_d3.parq", "testdata/data/sfs_d4.txt"]
     files_for_managed_tables = ["testdata/data/sfs_d3.parq", "testdata/data/sfs_d4.txt"]
-    hdfs_dir_for_external_tables = "/test-warehouse/{0}.db/".format(unique_database)
+    hdfs_dir_for_external_tables = "{0}/{1}.db/".format(WAREHOUSE, unique_database)
     hdfs_dir_for_managed_tables =\
-        "/test-warehouse/managed/{0}.db/".format(unique_database)
+        "{0}/managed/{1}.db/".format(WAREHOUSE, unique_database)
 
     copy_files_to_hdfs_dir(files_for_external_tables, hdfs_dir_for_external_tables)
     copy_files_to_hdfs_dir(files_for_managed_tables, hdfs_dir_for_managed_tables)
diff --git a/tests/stress/test_acid_stress.py b/tests/stress/test_acid_stress.py
index 96a61854d..6ba33d65f 100644
--- a/tests/stress/test_acid_stress.py
+++ b/tests/stress/test_acid_stress.py
@@ -190,7 +190,7 @@ class TestAcidInsertsBasic(TestAcidStress):
 
   @pytest.mark.execute_serially
   @pytest.mark.stress
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_partitioned_inserts(self, unique_database):
     """Check that the different ACID write operations take appropriate locks.
diff --git a/tests/stress/test_insert_stress.py b/tests/stress/test_insert_stress.py
index fe43a38c9..3b5af90d5 100644
--- a/tests/stress/test_insert_stress.py
+++ b/tests/stress/test_insert_stress.py
@@ -25,6 +25,7 @@ from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.parametrize import UniqueDatabase
 from tests.common.skip import SkipIf
 from tests.stress.stress_util import run_tasks, Task
+from tests.util.filesystem_utils import WAREHOUSE
 
 
 # Stress test for concurrent INSERT operations.
@@ -103,7 +104,7 @@ class TestInsertStress(ImpalaTestSuite):
 
   @pytest.mark.execute_serially
   @pytest.mark.stress
-  @SkipIf.not_hdfs
+  @SkipIf.not_dfs
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_iceberg_inserts(self, unique_database):
     """Issues INSERT statements against multiple impalads in a way that some
@@ -114,7 +115,7 @@ class TestInsertStress(ImpalaTestSuite):
     self.client.execute("""create table {0} (wid int, i int) stored as iceberg
         tblproperties('iceberg.catalog'='hadoop.catalog',
                       'iceberg.catalog_location'='{1}')""".format(
-        tbl_name, '/test-warehouse/' + unique_database))
+        tbl_name, "{0}/{1}".format(WAREHOUSE, unique_database)))
 
     counter = Value('i', 0)
     num_writers = 4

[impala] 01/02: IMPALA-11734: TestIcebergTable.test_compute_stats fails in RELEASE builds

Posted by mi...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 87e0077255104f5aba3cec4aeaab6096e608f4f9
Author: Daniel Becker <da...@cloudera.com>
AuthorDate: Mon Nov 21 09:52:24 2022 +0100

    IMPALA-11734: TestIcebergTable.test_compute_stats fails in RELEASE builds
    
    If the Impala version is set to a release build as described in point 8
    in the "How to Release" document
    (https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate),
    TestIcebergTable.test_compute_stats fails:
    
    Stacktrace
    query_test/test_iceberg.py:852: in test_compute_stats
    self.run_test_case('QueryTest/iceberg-compute-stats', vector,
    unique_database) common/impala_test_suite.py:742: in run_test_case
    self.__verify_results_and_errors(vector, test_section, result, use_db)
    common/impala_test_suite.py:578: in __verify_results_and_errors
    replace_filenames_with_placeholder) common/test_result_verifier.py:469:
    in verify_raw_results VERIFIER_MAP[verifier](expected, actual)
    common/test_result_verifier.py:278: in verify_query_result_is_equal
    assert expected_results == actual_results E assert Comparing
    QueryTestResults (expected vs actual): E 2,1,'2.33KB','NOT CACHED','NOT
    CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'
    != 2,1,'2.32KB','NOT CACHED','NOT
    CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'
    
    The problem is the file size which is 2.32KB instead of 2.33KB. This is
    because the version is written into the file, and "x.y.z-RELEASE" is one
    byte shorter than "x.y.z-SNAPSHOT". The size of the file in this test is
    on the boundary between 2.32KB and 2.33KB, so this one byte can change
    the value.
    
    This change fixes the problem by using a regex to accept both values so
    it works for both snapshot and release versions.
    
    Change-Id: Ia1fa12eebf936ec2f4cc1d5f68ece2c96d1256fb
    Reviewed-on: http://gerrit.cloudera.org:8080/19260
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../functional-query/queries/QueryTest/iceberg-compute-stats.test    | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test b/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
index e07d4f976..111dcdeab 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
@@ -21,7 +21,10 @@ show table stats ice_alltypes
 ---- LABELS
 #ROWS, #Files, Size, Bytes Cached, Cache Replication, Format, Incremental stats, Location
 ---- RESULTS: VERIFY_IS_EQUAL
-2,1,'2.33KB','NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/ice_alltypes'
+# The file size is on the boundary between 2.32KB and 2.33KB. The build version is written
+# into the file, and "x.y.z-RELEASE" is one byte shorter than "x.y.z-SNAPSHOT". In release
+# builds the file size is 2.32KB, in snapshot builds it is 2.33KB.
+2,1,regex:'2.3[23]KB','NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/ice_alltypes'
 ---- TYPES
 BIGINT,BIGINT,STRING,STRING,STRING,STRING,STRING,STRING
 ====