You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/01/08 18:53:00 UTC
[jira] [Commented] (HIVE-22707) MergeJoinWork should be considered
while collecting DAG credentials
[ https://issues.apache.org/jira/browse/HIVE-22707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010922#comment-17010922 ]
László Bodor commented on HIVE-22707:
-------------------------------------
cc: [~ashutoshc], if you can take a look, thanks
> MergeJoinWork should be considered while collecting DAG credentials
> -------------------------------------------------------------------
>
> Key: HIVE-22707
> URL: https://issues.apache.org/jira/browse/HIVE-22707
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Attachments: HIVE-22707.01.patch
>
>
> Given a scenario, when there are 2 different buckets, and the output is written to another bucket than the source. Under specific circumstances, FileSinkOperator is only used in Reducer stages, and if a root work in that stage is a merge join work, it's not scanned for output uris/paths, therefore needed delegation tokens are not fetched for e.g. the output s3 bucket.
> https://github.com/apache/hive/blob/0df4f6c61010b64246d4790f9ce14e966ef34dcb/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L1507-L1514
> {code}
> public void addCredentials(BaseWork work, DAG dag) throws IOException {
> dag.getCredentials().mergeAll(UserGroupInformation.getCurrentUser().getCredentials());
> if (work instanceof MapWork) {
> addCredentials((MapWork) work, dag);
> } else if (work instanceof ReduceWork) {
> addCredentials((ReduceWork) work, dag);
> }
> }
> {code}
> sample plan, not Merge Join Operator [MERGEJOIN_35]
> {code}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) |
> | Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE) |
> | |
> | Stage-3 |
> | Stats Work{} |
> | Stage-4 |
> | Create Table{"name:":"tpcds_bin_partitioned_orc_1000.catalog_sales_out"} |
> | Stage-0 |
> | Move Operator |
> | Stage-1 |
> | Reducer 3 |
> | File Output Operator [FS_20] |
> | Group By Operator [GBY_18] (rows=1 width=440) |
> | Output:["_col0"],aggregations:["compute_stats(VALUE._col0)"] |
> | <-Reducer 2 [CUSTOM_SIMPLE_EDGE] |
> | File Output Operator [FS_10] |
> | table:{"name:":"tpcds_bin_partitioned_orc_1000.catalog_sales_out"} |
> | Select Operator [SEL_9] (rows=8400 width=7) |
> | Output:["_col0"] |
> | Merge Join Operator [MERGEJOIN_35] (rows=8400 width=7) |
> | Conds:RS_38._col1=RS_41._col0(Inner),Output:["_col1"] |
> | <-Map 1 [SIMPLE_EDGE] vectorized |
> | SHUFFLE [RS_38] |
> | PartitionCols:_col1 |
> | Select Operator [SEL_37] (rows=16799 width=15) |
> | Output:["_col1"] |
> | Filter Operator [FIL_36] (rows=16799 width=15) |
> | predicate:((cs_sold_time_sk = 74858L) and cs_call_center_sk is not null) |
> | TableScan [TS_0] (rows=1439980416 width=15) |
> | tpcds_bin_partitioned_orc_1000@catalog_sales,cs, ACID table,Tbl:COMPLETE,Col:PARTIAL,Output:["cs_sold_time_sk","cs_call_center_sk"] |
> | <-Map 4 [SIMPLE_EDGE] vectorized |
> | SHUFFLE [RS_41] |
> | PartitionCols:_col0 |
> | Select Operator [SEL_40] (rows=21 width=107) |
> | Output:["_col0"] |
> | Filter Operator [FIL_39] (rows=21 width=107) |
> | predicate:((CAST( cc_county AS STRING) = 'Williamson County') and cc_call_center_sk is not null) |
> | TableScan [TS_3] (rows=42 width=107) |
> | tpcds_bin_partitioned_orc_1000@call_center,cc, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["cc_call_center_sk","cc_county"] |
> | PARTITION_ONLY_SHUFFLE [RS_17] |
> | Group By Operator [GBY_16] (rows=1 width=424) |
> | Output:["_col0"],aggregations:["compute_stats(col1, 'hll')"] |
> | Select Operator [SEL_15] (rows=8400 width=7) |
> | Output:["col1"] |
> | Please refer to the previous Select Operator [SEL_9] |
> | Stage-2 |
> | Dependency Collection{} |
> | Please refer to the previous Stage-1 |
> | |
> +----------------------------------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)