You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2018/07/27 00:29:00 UTC

[jira] [Created] (HIVE-20252) Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream.

Deepak Jaiswal created HIVE-20252:
-------------------------------------

             Summary: Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream.
                 Key: HIVE-20252
                 URL: https://issues.apache.org/jira/browse/HIVE-20252
             Project: Hive
          Issue Type: Bug
            Reporter: Deepak Jaiswal
            Assignee: Deepak Jaiswal


For eg,

 
 # 2018-07-26T17:22:14,664 DEBUG [51377701-dc98-424f-82e0-bbb5d6c84316 main] optimizer.SharedWorkOptimizer: Before SharedWorkOptimizer:
 # TS[0]-FIL[96]-SEL[2]-MAPJOIN[156]-MAPJOIN[157]-MAPJOIN[161]-MAPJOIN[162]-FIL[47]-SEL[48]-MAPJOIN[163]-FIL[66]-SEL[67]-TNK[105]-GBY[68]-RS[69]-GBY[70]-SEL[71]-RS[72]-SEL[73]-LIM[74]-FS[75]
 #                                                           -SEL[142]-GBY[143]-RS[144]-GBY[145]-RS[155]
 # TS[3]-FIL[97]-SEL[5]-RS[34]-MAPJOIN[156]
 # TS[6]-FIL[98]-SEL[8]-RS[37]-MAPJOIN[157]
 # TS[9]-FIL[99]-SEL[11]-MAPJOIN[158]-GBY[40]-RS[42]-MAPJOIN[161]
 # TS[12]-FIL[100]-SEL[14]-RS[16]-MAPJOIN[158]
 #                       -SEL[131]-GBY[132]-EVENT[133]
 # TS[19]-FIL[101]-SEL[21]-MAPJOIN[159]-GBY[29]-RS[30]-GBY[31]-SEL[32]-RS[45]-MAPJOIN[162]
 # TS[22]-FIL[102]-SEL[24]-RS[26]-MAPJOIN[159]
 #                       -SEL[139]-GBY[140]-EVENT[141]
 # TS[49]-FIL[103]-SEL[51]-MAPJOIN[160]-GBY[59]-RS[60]-GBY[61]-SEL[62]-RS[64]-MAPJOIN[163]
 # TS[52]-FIL[104]-SEL[54]-RS[56]-MAPJOIN[160]
 #                       -SEL[147]-GBY[148]-EVENT[149]
 # 
 # 
 # DPP information stored in the cache: \{TS[19]=[EVENT[141]], TS[9]=[EVENT[133]], TS[49]=[RS[155], EVENT[149]]}

 

The semi join branch in line 3 feeds into TS[49] in line 12 which feeds to MAPJOIN[163] going back to parent of the semi join branch at line 2.


The logic to detect cycle may fail as there is a MAPJOIN[160] at line 12 which could cause the logic to look for wrong TS. The logic to find TS operator upstream must use findOperatorsUpstream() and examine each TS Op for complete coverage.

 

cc [~jcamachorodriguez]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)