You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Rui Li (JIRA)" <ji...@apache.org> on 2017/10/24 03:38:00 UTC

[jira] [Commented] (HIVE-17877) HoS: combine equivalent DPP sink works

    [ https://issues.apache.org/jira/browse/HIVE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216249#comment-16216249 ] 

Rui Li commented on HIVE-17877:
-------------------------------

Upload a PoC patch. Here're the main changes:
# Before combining, each {{SparkPartitionPruningSinkDesc}} can target only one column in one map work. After combing, the remaining {{SparkPartitionPruningSinkDesc}} will hold the columns and map works from other equivalent {{SparkPartitionPruningSinkDesc}}.
# Two {{SparkPartitionPruningSinkDesc}} are equivalent if they have the same TableDesc.
# When we combine two equivalent works, if they contain DPP sinks, we'll merge the DPP sinks. Let's suppose we'll merge DPP1 and DPP2, which have target map works Map1 and Map2 respectively. First we add the target column/work of DPP2 to DPP1. Then we update Map2 so that it knows it'll be pruned by DPP1 instead of DPP2, i.e. updating the {{eventSource}} maps and tmp path.
# Currently {{CombineEquivalentWorkResolver}} doesn't handle leaf works. With the patch, it'll handle leaf works if all leaf operators in the leaf works are DPP sinks.
# Currently {{SparkPartitionPruningSinkOperator}} writes the target column name into the output file. Since now it can have multiple target columns, it first writes the number of columns and then writes all the target column names. In order to make column names unique, the target map work ID will be prepended to the column name.
# When {{SparkDynamicPartitionPruner}} reads the file, it reads in all the column names and finds the {{SourceInfo}} whose name is in the column names.

> HoS: combine equivalent DPP sink works
> --------------------------------------
>
>                 Key: HIVE-17877
>                 URL: https://issues.apache.org/jira/browse/HIVE-17877
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-17877.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)