You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/09/02 13:58:17 UTC

[GitHub] [hive] kgyrtkirk commented on a change in pull request #1439: HIVE-24084 cost aggr

kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482090289



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##########
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
     }
   }
 
+  /**
+   * Determines weather the give grouping is unique.
+   *
+   * Consider a join which might produce non-unique rows; but later the results are aggregated again.
+   * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s).
+   */
+  private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+    if (groups.isEmpty()) {
+      return false;
+    }
+    RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+    Set<ImmutableBitSet> uKeys = mq.getUniqueKeys(input);
+    for (ImmutableBitSet u : uKeys) {
+      if (groups.contains(u)) {
+        return true;
+      }
+    }
+    if (input instanceof Join) {
+      Join join = (Join) input;
+      RexBuilder rexBuilder = input.getCluster().getRexBuilder();
+      SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), rexBuilder);
+
+      if (cond.valid) {
+        ImmutableBitSet newGroup = groups.intersect(ImmutableBitSet.fromBitSet(cond.fields));
+        RelNode l = join.getLeft();
+        RelNode r = join.getRight();
+
+        int joinFieldCount = join.getRowType().getFieldCount();
+        int lFieldCount = l.getRowType().getFieldCount();
+
+        ImmutableBitSet groupL = newGroup.get(0, lFieldCount);
+        ImmutableBitSet groupR = newGroup.get(lFieldCount, joinFieldCount).shift(-lFieldCount);
+
+        if (isGroupingUnique(l, groupL)) {

Review comment:
       this method does a bit different thing - honestly I feeled like I'm in trouble when I've given this name to it :)
   
   this method checks if the given columns contain an unique column somewhere in the covered joins; (this still sound fuzzy) so let's take an example
   
   consider:
   ```
   select c_id, sum(i_prize) from customer c join item i on(i.c_id=c.c_id)
   ```
   
   * do an aggregate grouping by the column C_ID  ; and sum up something 
   * below is a join which joins by C_ID
   * asking wether C_ID  is a unique column on top of the join is false; but there is subtree in which C_ID is unique => so if we push the aggregate on that branch the aggregation will be a no-op
   
   I think this case is not handled by `areColumnsUnique`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org