You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/08 16:26:04 UTC
[GitHub] [iceberg] aimenglin opened a new issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
aimenglin opened a new issue #2685:
URL: https://github.com/apache/iceberg/issues/2685
Right now, we can read the Iceberg table via Hive, but the overlay table is an external table, so we cannot write the table via Hive.
I'm thinking about whether we can directly write a managed table via Hive. Any ideas?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aimenglin commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-857183739
What if we write the table as iceberg format via Hive?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aimenglin commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-859036146
@marton-bod I didn't find anything under this directory. There's a hidden file on test1, but still nothing inside.
```
shenme@wakeup:~$ gsutil ls -la gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/temp
CommandException: One or more URLs matched no objects.
shenme@wakeup:~$ gsutil ls -la gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
0 2021-06-09T23:25:48Z gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/#1623281148381224 metageneration=1
gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/data/
gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/metadata/
TOTAL: 1 objects, 0 bytes (0 B)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aimenglin commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858180173
1. Created a Iceberg Table using Hadoop Table on Spark:
// Create a Iceberg Table
```
spark-shell --conf spark.sql.warehouse.dir=gs://shenme_dataproc_1/spark-warehouse --jars /usr/lib/iceberg/jars/iceberg-spark-runtime.jar
import org.apache.hadoop.conf.Configuration
import org.apache.iceberg.hadoop.HadoopTables
import org.apache.iceberg.Table
import org.apache.iceberg.Schema
import org.apache.iceberg.types.Types._
import org.apache.iceberg.PartitionSpec
import org.apache.iceberg.spark.SparkSchemaUtil
import org.apache.spark.sql._
```
// Insert a 4 rows
```
scala> val conf = new Configuration();
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xm
l, hdfs-default.xml, hdfs-site.xml, resource-types.xml
scala> val tables = new HadoopTables(conf);
tables: org.apache.iceberg.hadoop.HadoopTables = org.apache.iceberg.hadoop.HadoopTables@6fac8b24
scala> val df1 = Seq((1,"Vincent","Computer Science"),(2,"Dan", "Economics"),(3,"Bob", "Politics"),(4,"Cindy", "UX Design")).toDF("id","name","major");
df1: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> val df1_schema = SparkSchemaUtil.convert(df1.schema);
df1_schema: org.apache.iceberg.Schema =
table {
0: id: required int
1: name: optional string
2: major: optional string
}
scala> val partition_spec = PartitionSpec.builderFor(df1_schema).identity("major").build;
partition_spec: org.apache.iceberg.PartitionSpec =
[
1000: major: identity(2)
]
scala> val table_location = "gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1";
table_location: String = gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
scala> val hadooptable = tables.create(df1_schema, partition_spec, table_location);
hadooptable: org.apache.iceberg.Table = gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
scala> df1.write.format("iceberg").mode("overwrite").save(table_location);
```
// Query 4 rows on Spark
```
scala> val read_df1=spark.read.format("iceberg").load(table_location);
read_df1: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> read_df1.show;
+---+-------+----------------+
| id| name| major|
+---+-------+----------------+
| 1|Vincent|Computer Science|
| 2| Dan| Economics|
| 3| Bob| Politics|
| 4| Cindy| UX Design|
+---+-------+----------------+
```
// Add another column
```
scala> val table = tables.load(table_location);
table: org.apache.iceberg.Table = gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
scala> table.updateSchema.addColumn("grade", StringType.get()).commit();
scala> table.schema.toString;
res8: String =
table {
1: id: required int
2: name: optional string
3: major: optional string
4: grade: optional string
}
```
// Add another 2 rows
```
scala> val df2=Seq((5,"Amy","UX Design","Sophomore")).toDF("id","name","major","grade");
df2: org.apache.spark.sql.DataFrame = [id: int, name: string ... 2 more fields]
scala> df2.write.format("iceberg").mode("append").save(table_location);
scala> val df3=Seq((6,"Rachael","Economics","Freshman")).toDF("id","name","major","grade");
df3: org.apache.spark.sql.DataFrame = [id: int, name: string ... 2 more fields]
scala> df3.write.format("iceberg").mode("append").save(table_location);
scala> val read_df2=spark.read.format("iceberg").load(table_location);
read_df2: org.apache.spark.sql.DataFrame = [id: int, name: string ... 2 more fields]
scala> spark.read.format("iceberg").load(table_location).show(truncate = false)
+---+-------+----------------+---------+
|id |name |major |grade |
+---+-------+----------------+---------+
|6 |Rachael|Economics |Freshman |
|1 |Vincent|Computer Science|null |
|2 |Dan |Economics |null |
|3 |Bob |Politics |null |
|4 |Cindy |UX Design |null |
|5 |Amy |UX Design |Sophomore|
+---+-------+----------------+---------+
```
// We have the data and the right set of metadata generated including snapshots, manifests in the bucket.
2.Created an external table on Hive
```
hive
Hive Session ID = 521d08fa-9fda-4ca3-8bab-3d0b60ddc6de
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
Hive Session ID = 4f8d6c05-19a2-4e1e-8672-52b79d1fd566
hive> add jar /home/shenme_google_com/iceberg-hive-runtime.jar;
Added [/home/shenme_google_com/iceberg-hive-runtime.jar] to class path
Added resources: [/home/shenme_google_com/iceberg-hive-runtime.jar]
hive> set iceberg.engine.hive.enabled=true;
hive> set hive.vectorized.execution.enabled=false;
hive> CREATE EXTERNAL TABLE hadoop_table_1
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 'gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/'
> TBLPROPERTIES ('iceberg.catalog'='gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1');
OK
Time taken: 3.698 seconds
```
3. Read is fine, and then I insert a new row:
```
// Read via Hive
hive> describe formatted hadoop_table_1;
OK
# col_name data_type comment
id int from deserializer
name string from deserializer
major string from deserializer
grade string from deserializer
# Detailed Table Information
Database: default
OwnerType: USER
Owner: shenme_google_com
CreateTime: Wed Jun 09 23:21:26 UTC 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
bucketing_version 2
iceberg.catalog gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
numFiles 18
storage_handler org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
table_type ICEBERG
totalSize 48550
transient_lastDdlTime 1623280886
# Storage Information
SerDe Library: org.apache.iceberg.mr.hive.HiveIcebergSerDe
InputFormat: org.apache.iceberg.mr.hive.HiveIcebergInputFormat
OutputFormat: org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.609 seconds, Fetched: 35 row(s)
hive> select * from hadoop_table_1;
Query ID = shenme_google_com_20210609232347_cc7ce64b-6b80-4169-b1d5-caf42a300e96
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1622522135523_0073)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 8.67 s
----------------------------------------------------------------------------------------------
OK
6 Rachael Economics Freshman
1 Vincent Computer Science NULL
2 Dan Economics NULL
3 Bob Politics NULL
4 Cindy UX Design NULL
5 Amy UX Design Sophomore
Time taken: 13.469 seconds, Fetched: 6 row(s)
```
// Insert a row
```
hive> INSERT INTO TABLE hadoop_table_1
> VALUES (7, "John", "Basketball", "Senior");
Query ID = shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1622522135523_0073)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 6.71 s
----------------------------------------------------------------------------------------------
OK
Time taken: 10.374 seconds
```
// Read and check whether the data was written into the table
```
hive> select * from hadoop_table_1;
Query ID = shenme_google_com_20210609232620_021c6e8a-cf7a-43d8-9ee9-44913a900e67
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1622522135523_0073)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 7.24 s
----------------------------------------------------------------------------------------------
OK
6 Rachael Economics Freshman
1 Vincent Computer Science NULL
2 Dan Economics NULL
3 Bob Politics NULL
4 Cindy UX Design NULL
5 Amy UX Design Sophomore
Time taken: 9.196 seconds, Fetched: 6 row(s)
```
4. GS bucket screenshot, we got data generated, but no metadata. So after select, the newly inserted row was not in the result.
![image](https://user-images.githubusercontent.com/84943281/121444407-5242ca00-c944-11eb-8d53-5d6d95b3c68b.png)
![image](https://user-images.githubusercontent.com/84943281/121443762-f9266680-c942-11eb-875d-a9b0d8bd6aa1.png)
Question:
I'm curious whether it's possible to read the newly inserted data via Hive. Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] marton-bod commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858552340
Thanks for the detailed steps! As you mentioned too, it seems that Hive generated the data files but something went wrong during the job commit (hence the lack of metadata generated). Can you share any logs from the execution side that we could take a look at?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aimenglin commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858084189
@marton-bod
The result I got is that I found the data was stored in the bucket, but no metadata generated. Also, if I read the table via Spark and Hive, the data was not updated and shown. Maybe I missed something when I created the external table, so I'll look at the resource you given. Thanks so much!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] marton-bod commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858008951
Hi @aimenglin , you should be able to write to overlay tables as well from Hive. Please refer to our documentation: https://github.com/apache/iceberg/blob/master/site/docs/hive.md. Hope this helps resolve your issue. In case not, please send the exact steps you tried and the error message you encountered, so we may be able to help further.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] marton-bod commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-859021968
Thanks for details! Sorry, I wasn't clear on which temp folder I'd like to see the contents of. Can you check this one:
`gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/temp`?
As for the logs, they are from the HS2 side, but is there a chance to get the logs from the Hadoop cluster where the MR job was executing? Unfortunately the job commit happens on the MR job-side and not on the HS2-side.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aimenglin commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858925332
@marton-bod Thanks for your patience and time for helping me debug.
1. Log
```
2021-06-09T23:25:37,850 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Compiling command(queryId=shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1): INSERT INTO TABLE hadoop_table_1
VALUES (7, "John", "Basketball", "Senior")
2021-06-09T23:25:37,876 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Concurrency mode is disabled, not creating a lock manager
2021-06-09T23:25:37,876 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Starting Semantic Analysis
2021-06-09T23:25:37,903 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Completed phase 1 of Semantic Analysis
2021-06-09T23:25:37,903 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for source tables
2021-06-09T23:25:37,903 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for subqueries
2021-06-09T23:25:37,903 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for destination tables
2021-06-09T23:25:37,928 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Completed getting MetaData in Semantic Analysis
2021-06-09T23:25:38,101 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Context: New scratch dir is hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de/hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:38,619 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for source tables
2021-06-09T23:25:38,620 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for subqueries
2021-06-09T23:25:38,620 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for source tables
2021-06-09T23:25:38,631 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for subqueries
2021-06-09T23:25:38,631 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for destination tables
2021-06-09T23:25:40,604 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Launching Job 1 out of 1
2021-06-09T23:25:40,604 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Starting task [Stage-2:MAPRED] in serial mode
2021-06-09T23:25:40,686 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Context: New scratch dir is hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de/hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.TezSessionPoolManager: The current user: shenme_google_com, session user: shenme_google_com
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.TezSessionPoolManager: Current queue name is null incoming queue name is null
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.Task: Subscribed to counters: [] for queryId: shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1
2021-06-09T23:25:38,631 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Get metadata for destination tables
2021-06-09T23:25:38,696 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Context: New scratch dir is hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de/hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:38,702 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:38,944 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:38,944 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:39,006 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:39,238 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:39,238 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:39,238 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:39,436 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:39,436 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:39,437 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] common.FileUtils: Creating directory if it doesn't exist: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1/.hive-staging_hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:39,518 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:39,707 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: CBO Succeeded; optimized logical plan.
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ppd.OpProcFactory: Processing for FS(4)
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ppd.OpProcFactory: Processing for SEL(3)
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ppd.OpProcFactory: Processing for UDTF(2)
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ppd.OpProcFactory: Processing for SEL(1)
2021-06-09T23:25:39,800 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ppd.OpProcFactory: Processing for TS(0)
2021-06-09T23:25:39,823 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.TezCompiler: Cycle free: true
2021-06-09T23:25:39,828 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] parse.CalcitePlanner: Completed plan generation
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Semantic Analysis Completed (retrial = false)
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null), FieldSchema(name:col2, type:string, comment:null), FieldSchema(name:col3, type:string, comment:null), FieldSchema(name:col4, type:string, comment:null)], properties:null)
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Completed compiling command(queryId=shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1); Time taken: 1.979 seconds
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] reexec.ReExecDriver: Execution #1 of query
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Concurrency mode is disabled, not creating a lock manager
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Executing command(queryId=shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1): INSERT INTO TABLE hadoop_table_1
VALUES (7, "John", "Basketball", "Senior")
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Query ID = shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1
2021-06-09T23:25:39,829 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Total jobs = 1
2021-06-09T23:25:39,830 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Starting task [Stage-0:DDL] in serial mode
2021-06-09T23:25:39,852 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:40,093 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:40,093 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:40,094 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:40,327 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:40,327 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:40,328 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] mr.Catalogs: Catalog is not configured
2021-06-09T23:25:40,528 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hadoop.HadoopTables: Table location loaded: gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/test1
2021-06-09T23:25:40,528 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] hive.HiveIcebergSerDe: Using schema from existing table {"type":"struct","fields":[{"id":1,"name":"id","required":true,"type":"int"},{"id":2,"name":"name","required":false,"type":"string"},{"id":3,"name":"major","required":false,"type":"string"},{"id":4,"name":"grade","required":false,"type":"string"}]}
2021-06-09T23:25:40,597 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Starting task [Stage-1:DDL] in serial mode
2021-06-09T23:25:40,604 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Launching Job 1 out of 1
2021-06-09T23:25:40,604 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Starting task [Stage-2:MAPRED] in serial mode
2021-06-09T23:25:40,686 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Context: New scratch dir is hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de/hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.TezSessionPoolManager: The current user: shenme_google_com, session user: shenme_google_com
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.TezSessionPoolManager: Current queue name is null incoming queue name is null
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.Task: Subscribed to counters: [] for queryId: shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1
2021-06-09T23:25:40,691 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.Task: Session is already open
2021-06-09T23:25:40,694 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.DagUtils: Resource modification time: 1623281030625 for hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/_tez_session_dir/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de-resources/iceberg-hive-runtime.jar
2021-06-09T23:25:40,694 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.Task: Dag name: INSERT INTO TABLE hadoop_table_1..."Senior") (Stage-2)
2021-06-09T23:25:40,698 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Context: New scratch dir is hdfs://shenme-cluster-3-m/tmp/hive/shenme_google_com/521d08fa-9fda-4ca3-8bab-3d0b60ddc6de/hive_2021-06-09_23-25-37_866_2611149539519878546-1
2021-06-09T23:25:40,758 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] tez.DagUtils: Vertex has custom input? false
2021-06-09T23:25:40,759 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.SerializationUtilities: Serializing MapWork using kryo
2021-06-09T23:25:40,789 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] exec.Utilities: Serialized plan (via RPC) - name: Map 1 size: 301.80KB
2021-06-09T23:25:40,891 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-521d08fa-9fda-4ca3-8bab-3d0b60ddc6de, applicationId=application_1622522135523_0073, dagName=INSERT INTO TABLE hadoop_table_1..."Senior") (Stage-2), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1 }
2021-06-09T23:25:40,991 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] client.TezClient: Submitted dag to TezSession, sessionName=HIVE-521d08fa-9fda-4ca3-8bab-3d0b60ddc6de, applicationId=application_1622522135523_0073, dagId=dag_1622522135523_0073_2, dagName=INSERT INTO TABLE hadoop_table_1..."Senior") (Stage-2)
2021-06-09T23:25:41,513 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] SessionState: Status: Running (Executing on YARN cluster with App id application_1622522135523_0073)
2021-06-09T23:25:41,515 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] SessionState: Map 1: 0/1
2021-06-09T23:25:44,542 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] SessionState: Map 1: 0(+1)/1
2021-06-09T23:25:47,576 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] SessionState: Map 1: 0(+1)/1
2021-06-09T23:25:48,220 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] SessionState: Map 1: 1/1
2021-06-09T23:25:48,223 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Starting task [Stage-4:DDL] in serial mode
2021-06-09T23:25:48,224 INFO [521d08fa-9fda-4ca3-8bab-3d0b60ddc6de main] ql.Driver: Completed executing command(queryId=shenme_google_com_20210609232537_9d2ce449-4ebc-403b-8f74-8b08ba9f62d1); Time taken: 8.395 seconds
```
2. Version
![image](https://user-images.githubusercontent.com/84943281/121573545-3c7fe400-c9da-11eb-9bad-5b20e070302b.png)
3. Explain Plan
```
hive> EXPLAIN INSERT INTO TABLE hadoop_table_1 VALUES (8, "Eve", "Zoology", "Senior");
OK
Plan optimized by CBO.
Stage-0
Alter Table Operator:
old name:default.hadoop_table_1,type:drop props
Stage-1
Pre Insert operator:{"Pre-Insert task":{}}
Stage-4
Insert operator:{"Commit-Insert-Hook":{}}
Stage-2
Map 1
File Output Operator [FS_4]
table:{"name:":"default.hadoop_table_1"}
Select Operator [SEL_3] (rows=1 width=8)
Output:["_col0","_col1","_col2","_col3"]
UDTF Operator [UDTF_2] (rows=1 width=48)
function name:inline
Select Operator [SEL_1] (rows=1 width=48)
Output:["_col0"]
TableScan [TS_0] (rows=1 width=10)
_dummy_database@_dummy_table,_dummy_table,Tbl:COMPLETE,Col:COMPLETE
Time taken: 1.592 seconds, Fetched: 22 row(s)
```
4. Everytime when I insert the data, there'll be a _tmp file, but it's empty.
```
gsutil ls -la gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/_tmp.test1/
gsutil cat gs://gcs-bucket-shenme-test-service-7f1dbd01-5cc9-4393-a54a-e9c2b9c9/hive-warehouse/_tmp.test1/#1623347746784286
```
![image](https://user-images.githubusercontent.com/84943281/121574915-98973800-c9db-11eb-9eec-6aba62c10479.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] marton-bod commented on issue #2685: How to realize Write Iceberg Tables via Hive? (Ideas share)
Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-858559409
Also, can you please share:
- version of Iceberg and Hive that you use
- the explain plan of the insert query
- the contents of the tableLocation/temp/ directory (this is where Hive stores the write commit related temporary files)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org