You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/26 22:46:28 UTC

[GitHub] [hive] jcamachor commented on a change in pull request #1878: [DRAFT] Remove HiveSubQRemoveRelBuilder

jcamachor commented on a change in pull request #1878:
URL: https://github.com/apache/hive/pull/1878#discussion_r564886525



##########
File path: ql/src/test/results/clientpositive/llap/subquery_in.q.out
##########
@@ -408,12 +408,18 @@ STAGE PLANS:
                     expressions: (UDFToDouble(_col0) / _col1) (type: double)
                     outputColumnNames: _col0
                     Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
-                    Reduce Output Operator
-                      key expressions: _col0 (type: double)
-                      null sort order: z
-                      sort order: +
-                      Map-reduce partition columns: _col0 (type: double)
+                    Group By Operator

Review comment:
       @vineetgarg02 , I was checking this. In the previous plan, we were executing an inner join. In this plan, we are executing a semijoin. From looking at the code, it seems for SJ we always create a mapside group by operator deterministically, without considering whether that group by would reduce the input data: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L9406 . That may not be too bad since the group by can internally switch to streaming mode if it's not reducing the input size.
   From your comment though, I think I understand that there is some optimization that may have kicked in to remove that group by? Could you elaborate?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org