You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/28 07:51:40 UTC
[GitHub] [doris] myfjdthink commented on issue #10452: [Feature] Doris support read iceberg table on google cloud storage
myfjdthink commented on issue #10452:
URL: https://github.com/apache/doris/issues/10452#issuecomment-1168365071
In addition to authorization issues, doirs are also unable to read data stored on gcs.
Take a look at this example
I wrote iceberg table on spark and the data is stored on gcs
and then read it in the hive environment, it works fine
`hive> select * from gs_table2;`
```
Query ID = nick_20220628041137_6fe75ec9-bae7-40b2-8d96-f6653fcdfb49
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1656302766976_0013)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 8.34 s
----------------------------------------------------------------------------------------------
OK
1 a
2 b
3 c
4 a
1 a
2 b
3 c
4 a
Time taken: 9.569 seconds, Fetched: 8 row(s)
```
`hive> describe formatted gs_table2;`
```
OK
# col_name data_type comment
id int from deserializer
data string from deserializer
# Detailed Table Information
Database: gsdb
OwnerType: USER
Owner: nick
CreateTime: Tue Jun 28 02:34:36 UTC 2022
LastAccessTime: Sun Dec 14 22:38:21 UTC 1969
Retention: 2147483647
Location: gs://iceberg-spark/datasets/gsdb.db/gs_table2
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
metadata_location gs://iceberg-spark/datasets/gsdb.db/gs_table2/metadata/00002-833dafb8-cda5-4c7c-a2ea-04e2c84fa372.metadata.json
numFiles 8
numRows 8
owner nick
previous_metadata_location gs://iceberg-spark/datasets/gsdb.db/gs_table2/metadata/00001-0c51975f-0d0a-4c2e-a5d1-aad3251c0393.metadata.json
storage_handler org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
table_type ICEBERG
totalSize 4952
transient_lastDdlTime 1656383676
uuid 86645efc-8a03-47f0-a397-1ec30877b1bb
# Storage Information
SerDe Library: org.apache.iceberg.mr.hive.HiveIcebergSerDe
InputFormat: org.apache.iceberg.mr.hive.HiveIcebergInputFormat
OutputFormat: org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
Compressed: No
Num Buckets: 0
Bucket Columns: []
Sort Columns: []
Time taken: 0.206 seconds, Fetched: 34 row(s)
```
Then I try to create the iceberg table in doris
```sql
CREATE TABLE `gs_table2`
ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "gsdb",
"iceberg.table" = "gs_table2",
"iceberg.hive.metastore.uris" = "thrift://10.201.0.104:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
sql execution timeout, table creation failed
Let's look at another example
I wrote the iceberg table on spark, the data is stored in hdfs
`hive> select * from test_table;`
```
Query ID = nick_20220628042917_7209ca76-1704-45e1-8678-88af4297b64a
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1656302766976_0014)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 7.40 s
----------------------------------------------------------------------------------------------
OK
1 a
2 b
3 c
Time taken: 14.78 seconds, Fetched: 3 row(s)
```
`hive> describe formatted test_table;`
```
OK
# col_name data_type comment
id bigint from deserializer
data string from deserializer
# Detailed Table Information
Database: testdb
OwnerType: USER
Owner: nick
CreateTime: Thu Jun 23 11:11:56 UTC 2022
LastAccessTime: Wed Dec 10 07:15:41 UTC 1969
Retention: 2147483647
Location: hdfs://10.201.0.104:8020/user/hive/warehouse/testdb.db/test_table
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
metadata_location hdfs://10.201.0.104:8020/user/hive/warehouse/testdb.db/test_table/metadata/00001-ceb6024b-3d7d-4304-8fff-f2aad293d2cf.metadata.json
numFiles 3
numRows 3
owner nick
previous_metadata_location hdfs://10.201.0.104:8020/user/hive/warehouse/testdb.db/test_table/metadata/00000-f95f2c72-1e64-421b-a4b9-bccb77984d32.metadata.json
storage_handler org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
table_type ICEBERG
totalSize 1929
transient_lastDdlTime 1655982716
uuid b3a2408e-bc56-4486-bd81-65b2be22b2f3
# Storage Information
SerDe Library: org.apache.iceberg.mr.hive.HiveIcebergSerDe
InputFormat: org.apache.iceberg.mr.hive.HiveIcebergInputFormat
OutputFormat: org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
Compressed: No
Num Buckets: 0
Bucket Columns: []
Sort Columns: []
Time taken: 0.144 seconds, Fetched: 34 row(s)
```
and then try to create iceberg table in doris, created successfully and successfully read the data
```sql
CREATE TABLE `test_table`
ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "testdb",
"iceberg.table" = "test_table",
"iceberg.hive.metastore.uris" = "thrift://10.201.0.104:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
select * from iceberg_db.test_table;
```
query result
data|id|
----+--+
a | 1|
b | 2|
c | 3|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org