You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/03/13 13:02:00 UTC
[jira] [Commented] (IMPALA-7712) Impala read from and write to GCS
[ https://issues.apache.org/jira/browse/IMPALA-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300850#comment-17300850 ]
ASF subversion and git services commented on IMPALA-7712:
---------------------------------------------------------
Commit 2dfc68d85277f05bf20c09e31dd10c9474ada62c in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2dfc68d ]
IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Impala read from and write to GCS
> ---------------------------------
>
> Key: IMPALA-7712
> URL: https://issues.apache.org/jira/browse/IMPALA-7712
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: Impala 3.3.0
> Reporter: Haaris
> Assignee: Quanlong Huang
> Priority: Critical
> Labels: cloudera, connector, google-cloud-storage, impala
>
> Can Impala read from and write to google cloud storage GCS like the way it does with amazon s3
> I have tested the use case with S3, but when talking to GCS impala errors out with:
> Query: create table gcs_impala2 (title string) location 'gs://mybucket-gcs/some_data/' ERROR: AnalysisException: null CAUSED BY: RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found CAUSED BY: ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
> On the same cluster i have Hive talking to GCS using the GCS connector jar provided by google form :
> [https://cloud.google.com/dataproc/docs/concepts/connectors/install-storage-connector]
>
> Also, HDFS reads and writes from/to GCS.
>
> Made sure java version matches and appropriate values are in classpath.
>
> Appreciate your time and effort.
> Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org