You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/02/04 09:07:58 UTC

[GitHub] [hive] szlta commented on a change in pull request #2842: HIVE-25772: Use ClusteredWriter when writing to Iceberg tables

szlta commented on a change in pull request #2842:
URL: https://github.com/apache/hive/pull/2842#discussion_r799282617



##########
File path: iceberg/iceberg-handler/src/test/results/positive/dynamic_partition_writes.q.out
##########
@@ -0,0 +1,220 @@
+PREHOOK: query: drop table if exists tbl_src
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists tbl_src
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table if exists tbl_target_identity
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists tbl_target_identity
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table if exists tbl_target_bucket
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists tbl_target_bucket
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: create external table tbl_src (a int, b string) stored by iceberg stored as orc
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl_src
+POSTHOOK: query: create external table tbl_src (a int, b string) stored by iceberg stored as orc
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl_src
+PREHOOK: query: insert into tbl_src values (1, 'EUR'), (2, 'EUR'), (3, 'USD'), (4, 'EUR'), (5, 'HUF'), (6, 'USD'), (7, 'USD'), (8, 'PLN'), (9, 'PLN'), (10, 'CZK')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@tbl_src
+POSTHOOK: query: insert into tbl_src values (1, 'EUR'), (2, 'EUR'), (3, 'USD'), (4, 'EUR'), (5, 'HUF'), (6, 'USD'), (7, 'USD'), (8, 'PLN'), (9, 'PLN'), (10, 'CZK')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@tbl_src
+PREHOOK: query: insert into tbl_src values (10, 'EUR'), (20, 'EUR'), (30, 'USD'), (40, 'EUR'), (50, 'HUF'), (60, 'USD'), (70, 'USD'), (80, 'PLN'), (90, 'PLN'), (100, 'CZK')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@tbl_src
+POSTHOOK: query: insert into tbl_src values (10, 'EUR'), (20, 'EUR'), (30, 'USD'), (40, 'EUR'), (50, 'HUF'), (60, 'USD'), (70, 'USD'), (80, 'PLN'), (90, 'PLN'), (100, 'CZK')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@tbl_src
+PREHOOK: query: create external table tbl_target_identity (a int) partitioned by (ccy string) stored by iceberg stored as orc
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl_target_identity
+POSTHOOK: query: create external table tbl_target_identity (a int) partitioned by (ccy string) stored by iceberg stored as orc
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl_target_identity
+PREHOOK: query: explain insert overwrite table tbl_target_identity select * from tbl_src
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl_src
+PREHOOK: Output: default@tbl_target_identity
+POSTHOOK: query: explain insert overwrite table tbl_target_identity select * from tbl_src
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl_src
+POSTHOOK: Output: default@tbl_target_identity
+Plan optimized by CBO.
+
+Vertex dependency in root stage
+Reducer 2 <- Map 1 (SIMPLE_EDGE)
+Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+
+Stage-3
+  Stats Work{}
+    Stage-0
+      Move Operator
+        table:{"name:":"default.tbl_target_identity"}
+        Stage-2
+          Dependency Collection{}
+            Stage-1
+              Reducer 2 vectorized
+              File Output Operator [FS_18]
+                table:{"name:":"default.tbl_target_identity"}
+                Select Operator [SEL_17]
+                  Output:["_col0","_col1"]
+                <-Map 1 [SIMPLE_EDGE] vectorized
+                  PARTITION_ONLY_SHUFFLE [RS_13]
+                    PartitionCols:_col1
+                    Select Operator [SEL_12] (rows=20 width=91)
+                      Output:["_col0","_col1"]
+                      TableScan [TS_0] (rows=20 width=91)
+                        default@tbl_src,tbl_src,Tbl:COMPLETE,Col:COMPLETE,Output:["a","b"]
+              Reducer 3 vectorized
+              File Output Operator [FS_21]
+                Select Operator [SEL_20] (rows=1 width=530)
+                  Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
+                  Group By Operator [GBY_19] (rows=1 width=332)
+                    Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["min(VALUE._col0)","max(VALUE._col1)","count(VALUE._col2)","count(VALUE._col3)","compute_bit_vector_hll(VALUE._col4)","max(VALUE._col5)","avg(VALUE._col6)","count(VALUE._col7)","compute_bit_vector_hll(VALUE._col8)"]
+                  <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized
+                    PARTITION_ONLY_SHUFFLE [RS_16]
+                      Group By Operator [GBY_15] (rows=1 width=400)
+                        Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["min(a)","max(a)","count(1)","count(a)","compute_bit_vector_hll(a)","max(length(ccy))","avg(COALESCE(length(ccy),0))","count(ccy)","compute_bit_vector_hll(ccy)"]
+                        Select Operator [SEL_14] (rows=20 width=91)
+                          Output:["a","ccy"]
+                           Please refer to the previous Select Operator [SEL_12]
+
+PREHOOK: query: insert overwrite table tbl_target_identity select * from tbl_src
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl_src
+PREHOOK: Output: default@tbl_target_identity
+POSTHOOK: query: insert overwrite table tbl_target_identity select * from tbl_src
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl_src
+POSTHOOK: Output: default@tbl_target_identity
+PREHOOK: query: select * from tbl_target_identity
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl_target_identity
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: select * from tbl_target_identity
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl_target_identity
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+100	CZK
+10	CZK
+10	EUR
+20	EUR
+40	EUR
+2	EUR
+4	EUR
+1	EUR
+50	HUF
+5	HUF
+90	PLN
+8	PLN
+9	PLN
+80	PLN
+30	USD
+3	USD
+6	USD
+7	USD
+60	USD
+70	USD
+PREHOOK: query: create external table tbl_target_bucket (a int, ccy string) partitioned by spec (bucket (2, ccy)) stored by iceberg stored as orc
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl_target_bucket
+POSTHOOK: query: create external table tbl_target_bucket (a int, ccy string) partitioned by spec (bucket (2, ccy)) stored by iceberg stored as orc
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl_target_bucket
+PREHOOK: query: explain insert into table tbl_target_bucket select * from tbl_src
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl_src
+PREHOOK: Output: default@tbl_target_bucket
+POSTHOOK: query: explain insert into table tbl_target_bucket select * from tbl_src
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl_src
+POSTHOOK: Output: default@tbl_target_bucket
+Plan optimized by CBO.
+
+Vertex dependency in root stage
+Reducer 2 <- Map 1 (SIMPLE_EDGE)
+Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+
+Stage-3

Review comment:
       If there were no sort, you would only see one Reducer task. I believe in this case it's Reducer 2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org