You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/02/25 12:23:06 UTC

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17121


Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................

[WIP] IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage).

TODO: ranger-audit-plugin includes dependencies on gcs-connector but
it's for hadoop-2.x. Decided whether we can use it or use a 3.x version.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
TODO: Add tests
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config-branch.sh
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
11 files changed, 69 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8329/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 12:56:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8314/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 10:40:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6962/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 23:18:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
68 files changed, 303 insertions(+), 64 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6948/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 23:54:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py
File tests/custom_cluster/test_hive_text_codec_interop.py:

http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py@24
PS2, Line 24: from tests.common.skip import SkipIfS3, SkipGCS
flake8: F401 'tests.common.skip.SkipGCS' imported but unused


http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py@55
PS2, Line 55: S
flake8: F821 undefined name 'SkipIfGCS'



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Feb 2021 13:12:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8237/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Feb 2021 12:42:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#10).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
68 files changed, 303 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#5).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
 - cache_gcs_file_handles: Enable the file handle cache for GCS files.
   Defaults to true.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
70 files changed, 299 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6:

(4 comments)

Thanks for your quick review, Joe! Addressed your comments. Also rename some env vars to fit with other var names.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186: 
> We'll need to double check that GCS file handles work with the file handle 
Ah, sure. I thought tests/custom_cluster/test_hdfs_fd_caching.py provide the coverage. But looking into codes of gcs-connector, GoogleHadoopFSInputStream doesn't implement the CanUnbuffer interface: https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/905e45d58a7b331f4b590815f0e6d0706022088d/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java#L31
So it hasn't supported unbuffer() yet.

I'll remove this flag and leave it as a follow-up work in IMPALA-10568. Also filed a feature request for GCS: https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/540


http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863
PS5, Line 863:     }
             : 
             :     @Ov
> I have seen an issue like this before on older versions of the S3 connector
Good point!


http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80
PS5, Line 80:       hadoop fs -rm -r -skipTrash ${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR}
> I'm assuming that this command to remove any existing warehouse works for G
Yeah, the hadoop CLI works with GCS as well.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Does IMPALA-10563 consistently reproduce? Do we have any idea if it is spec
Yes, it's consistently reproducable on GCE instances, even if I use a newer hive version (3.1.3000.7.2.9.0-100).
I can always find exceptions in some write ids allocation in HMS's log. The error will be retried by it causes slow down. It seems a Hive bug for me so needs further investigation.

FWIW, I'm using GCE instance type n1-standard-16 (16cpu, 60GB RAM).



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 10:22:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 05:37:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 5:

(4 comments)

Here's my first pass. I'm also going to look at the ABFS change and see if anything is different.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186: true
We'll need to double check that GCS file handles work with the file handle cache. It would be ok to default to false until we validate it. In previous cases like ABFS, we started out with this disabled.

We would need to verify that unbuffer() is implemented and everything behaves well when there are thousands of these sitting around in the cache. It might just work.


http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863
PS5, Line 863:       if (fs instanceof GoogleHadoopFileSystem) {
             :         curIter_.hasNext();
             :       }
I have seen an issue like this before on older versions of the S3 connector as well for recursively_list_partitions=false.

If recursively_list_partitions=false, we won't go through this codepath. We would have a FilterIterator that uses a fs.listStatusIterator(p) directly. See listStatus() in this file.

It might make sense for us to put this in the constructor for FilterIterator and do it for all filesystems.


http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80
PS5, Line 80:       hadoop fs -rm -r -skipTrash ${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR}
I'm assuming that this command to remove any existing warehouse works for GCS, because it looks like this is what we would use for ABFS/ADLS.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
Does IMPALA-10563 consistently reproduce? Do we have any idea if it is specific to the minicluster environment or if it could happen on a real cluster?



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:04:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 7:

(5 comments)

> I had a couple small nits, but this makes sense to me. The only concern I would have is if IMPALA-10563 is more than just a slow down.

Sure. I'm testing concurrent inserts on a real cluster on GCP to see if the issue occurs.

http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@804
PS6, Line 804: hasNex
> Nit: hasNext()
Done


http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@805
PS6, Line 805: hasNext(
> Nit: hasNext()
Done


http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@806
PS6, Line 806: hasNext(
> Nit: hasNext()
Done


http://gerrit.cloudera.org:8080/#/c/17121/6/tests/custom_cluster/test_hdfs_fd_caching.py
File tests/custom_cluster/test_hdfs_fd_caching.py:

http://gerrit.cloudera.org:8080/#/c/17121/6/tests/custom_cluster/test_hdfs_fd_caching.py@132
PS6, Line 132:     # Caching applies to HDFS, S3, and ABFS files. If this is HDFS, S3, or ABFS, then
             :     # verify that caching works. Otherwise, verify that file handles are not cached.
> Nit: Now we don't cache GCS file handles, so this needs to be updated.
Done


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Ok, to be clear, the statement runs slower, but it does eventually complete
Yeah, in the time out period (600s), only half of the inserts finish. I'm trying to see if extending the timeout period can let it pass.
BTW, on HDFS, this test takes 36s. On S3, this test takes 90s.

I'm also testing concurrent inserts on a real cluster on GCP.



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 01:48:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py
File tests/custom_cluster/test_hive_text_codec_interop.py:

http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py@24
PS2, Line 24: from tests.common.skip import SkipIfS3, SkipIfGCS
> flake8: F401 'tests.common.skip.SkipGCS' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/17121/2/tests/custom_cluster/test_hive_text_codec_interop.py@55
PS2, Line 55: S
> flake8: F821 undefined name 'SkipIfGCS'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 13:32:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#8).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
68 files changed, 302 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 7: Code-Review+2

On second thought, I'm in favor of merging the basic support even if we know about some problem cases. It is better to have the code in, and we can document any known issues in release notes if necessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 01:54:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#6).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
69 files changed, 296 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17121/1/bin/impala-config-branch.sh
File bin/impala-config-branch.sh:

http://gerrit.cloudera.org:8080/#/c/17121/1/bin/impala-config-branch.sh@27
PS1, Line 27: export GOOGLE_APPLICATION_CREDENTIALS="/home/quanlong/workspace/Impala/hql-impala-key.json"
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Feb 2021 12:23:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#3).

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................

[WIP] IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage).

Test steps:
 - Compile and create test data on a GCE instance.
 - Upload test data to a GCS bucket.
 - Modify the filesystem prefix of all locations in HMS DB to point to
   the GCS bucket. Remove some hdfs caching params.
 - TODO: Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
61 files changed, 263 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 8:

Fixed the deadloop found in tests/stress/test_insert_stress.py.
Still investigating the other two timeout issues of IMPALA-10563


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 12:37:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#9).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
68 files changed, 303 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#7).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
69 files changed, 294 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6964/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 05:37:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6957/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 08:36:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10:

(1 comment)

> Patch Set 10: Code-Review+2
> 
> (1 comment)
> 
> Given the analysis in IMPALA-10563, it seems fine to disable those test cases for now.
> 
> See my note about IMPALA-10579. I think it is ok to include this partial fix, as it seems better than what we have right now. If IMPALA-10579 was landing very soon, I would be ok with removing this piece of the fix and relying on IMPALA-10579.
> 
> This change makes sense to me, and it is good to have the GCS support land.

Thanks Joe's review! IMPALA-10579 (https://gerrit.cloudera.org/c/17171/) will take some time to land. So let's have the conservative fix for GCS first.

http://gerrit.cloudera.org:8080/#/c/17121/10/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/10/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@713
PS10, Line 713:   /**
              :    * Wrapper around FileSystem.listStatusIterator() to make sure the path exists.
              :    *
              :    * @throws FileNotFoundException if <code>p</code> does not exist
              :    * @throws IOException if any I/O error occurredd
              :    */
              :   public static RemoteIterator<FileStatus> listStatusIterator(FileSystem fs, Path p)
              :       throws IOException {
              :     RemoteIterator<FileStatus> iterator = fs.listStatusIterator(p);
              :     // Some FileSystem implementations like GoogleHadoopFileSystem doesn't check
              :     // existence of the start path when creating the RemoteIterator. Instead, their
              :     // iterators throw the FileNotFoundException in the first call of hasNext() when
              :     // the start path doesn't exist. Here we call hasNext() to ensure start path exists.
              :     iterator.hasNext();
              :     return iterator;
> This code will be replaced by IMPALA-10579.
Yeah, exactly! For IMPALA-10579 (https://gerrit.cloudera.org/c/17171/), I plan to test the patch on Ozone, S3 and ABFS so it will take some time.

The changes in this patch is conservative so we can assure it won't impact other filesystems. (I have verified it on HDFS and GCS)



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 23:17:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 14:16:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10: Code-Review+2

(1 comment)

Given the analysis in IMPALA-10563, it seems fine to disable those test cases for now.

See my note about IMPALA-10579. I think it is ok to include this partial fix, as it seems better than what we have right now. If IMPALA-10579 was landing very soon, I would be ok with removing this piece of the fix and relying on IMPALA-10579.

This change makes sense to me, and it is good to have the GCS support land.

http://gerrit.cloudera.org:8080/#/c/17121/10/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/10/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@713
PS10, Line 713:   /**
              :    * Wrapper around FileSystem.listStatusIterator() to make sure the path exists.
              :    *
              :    * @throws FileNotFoundException if <code>p</code> does not exist
              :    * @throws IOException if any I/O error occurredd
              :    */
              :   public static RemoteIterator<FileStatus> listStatusIterator(FileSystem fs, Path p)
              :       throws IOException {
              :     RemoteIterator<FileStatus> iterator = fs.listStatusIterator(p);
              :     // Some FileSystem implementations like GoogleHadoopFileSystem doesn't check
              :     // existence of the start path when creating the RemoteIterator. Instead, their
              :     // iterators throw the FileNotFoundException in the first call of hasNext() when
              :     // the start path doesn't exist. Here we call hasNext() to ensure start path exists.
              :     iterator.hasNext();
              :     return iterator;
This code will be replaced by IMPALA-10579.

I'm guessing that the thought here is that this is better than what we have, and the fuller fix will come from IMPALA-10579.



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 18:42:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8281/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 13:52:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8304/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:10:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py@656
PS4, Line 656: .
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 08 Mar 2021 14:52:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8346/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:47:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 23:18:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8249/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Feb 2021 13:30:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 10:

Finished the analysis on IMPALA-10563. The timeout issues are due to slow file listing. See more in the jira. I think we are safe to skip them for now.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:28:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 11:20:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8345/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:44:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6962/


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 05:29:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6: Code-Review+2

(6 comments)

Ok, I looked at this and compared it to the ABFS change and verified that there isn't anything that is missing.

I had a couple small nits, but this makes sense to me. The only concern I would have is if IMPALA-10563 is more than just a slow down.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186: 
> Ah, sure. I thought tests/custom_cluster/test_hdfs_fd_caching.py provide th
Yeah, file handle cache testing is not something that we have completely automated. We have basic cases covered, but we don't do the more intricate cases. It makes sense to push that out.


http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@804
PS6, Line 804: hasNex
Nit: hasNext()


http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@805
PS6, Line 805: hashNext
Nit: hasNext()


http://gerrit.cloudera.org:8080/#/c/17121/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@806
PS6, Line 806: hashNext
Nit: hasNext()


http://gerrit.cloudera.org:8080/#/c/17121/6/tests/custom_cluster/test_hdfs_fd_caching.py
File tests/custom_cluster/test_hdfs_fd_caching.py:

http://gerrit.cloudera.org:8080/#/c/17121/6/tests/custom_cluster/test_hdfs_fd_caching.py@132
PS6, Line 132:     # Caching applies to HDFS, S3, ABFS, and GCS files. If this is HDFS, S3, ABFS or GCS,
             :     # then verify that caching works. Otherwise, verify that file handles are not cached.
Nit: Now we don't cache GCS file handles, so this needs to be updated.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Yes, it's consistently reproducable on GCE instances, even if I use a newer
Ok, to be clear, the statement runs slower, but it does eventually complete correctly. Is that right?



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 01:24:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#2).

Change subject: [WIP] IMPALA-7712: Support Google Cloud Storage
......................................................................

[WIP] IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage).

TODO: fix hanging when loading table/partition on nonexisting location
  (e.g. test_create_alter_bulk_partition)
TODO: fix crash in spilling when default fs is GCS.
  (e.g. test_queries.py::TestQueries::test_analytic_fns)
TODO: Skip more tests that are skiped on non-hdfs storage.

Test steps:
 - Compile and create test data on a GCE instance.
 - Upload test data to a GCS bucket.
 - Modify the filesystem prefix of all locations in HMS DB to point to
   the GCS bucket. Remove some hdfs caching params.
 - TODO: Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_scanners.py
M tests/stress/test_acid_stress.py
M tests/util/filesystem_utils.py
39 files changed, 187 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Yeah, in the time out period (600s), only half of the inserts finish. I'm t
Sorry that I found it's not just a slow down. I found a dead-loop in catalogd due to calling RemoteIterator#hasNext() in FileSystemUtil$FilterIterator#hasNext(). It seems the iterator implementation of GCS won't skip non-existing files after throwing a FileNotFoundException. And it keeps throwing the same exception for the same file in the next call of hasNext().

This happens when concurrent inserts to the same table. Some transient tmp files will be removed after an Insert finish, which causes file listing of other Inserts throw FileNotFoundException. The HDFS implementation is able to skip them in the next call of hasNext(), but GCS can't.

I'm trying to find a workaround for this issue...



-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 02:53:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#4).

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
 - cache_gcs_file_handles: Enable the file handle cache for GCS files.
   Defaults to true.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
70 files changed, 299 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8325/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 02:05:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8305/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:15:48 +0000
Gerrit-HasComments: No