You are viewing a plain text version of this content. The canonical link for it is here.

Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/09/28 06:43:54 UTC

[GitHub] [hive] shameersss1 commented on a change in pull request #1400: HIVE-18284: Fix NPE when inserting data with 'distribute by' clause with dynpart sort optimization

shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r495719674



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##########
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, ReduceSinkOperator cRS, ReduceSin
         TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new ArrayList<FieldSchema>(), pRS
             .getConf().getOrder(), pRS.getConf().getNullOrder());
         pRS.getConf().setKeySerializeInfo(keyTable);
+      } else if (cRS.getConf().getKeyCols() != null && cRS.getConf().getKeyCols().size() > 0) {
+        ArrayList<String> keyColNames = Lists.newArrayList();
+        for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) {
+          String keyColName = keyCol.getExprString();
+          keyColNames.add(keyColName);
+        }
+        List<FieldSchema> fields = PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(),
+            keyColNames, 0, "");
+        TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields, pRS.getConf().getOrder(),
+            pRS.getConf().getNullOrder());
+        ArrayList<String> outputKeyCols = Lists.newArrayList();
+        for (int i = 0; i < fields.size(); i++) {
+          outputKeyCols.add(fields.get(i).getName());
+        }
+        pRS.getConf().setOutputKeyColumnNames(outputKeyCols);
+        pRS.getConf().setKeySerializeInfo(keyTable);
+        pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys());
       }

Review comment:
       Just to add more context here, Number of distribution keys of cRS is chosen only when numDistKeys of pRS is 0 or less. In all other cases, distribution of the keys is based on the pRS which is more generic than cRS. We will enter this "if" condition only in two cases
   
   1. pRS keyCol is empty and cRS keyCol is empty
   2. pRS keyCol is empty and cRS keyCol is not empty
   
   So in case I we would like to keep the pRS properties intact since pRS is more generic. In case (2) we want to go with cRS properties hence i think returning false is not required.
   
   Does this make sense? Or am i missing any thing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org