You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/07/30 07:29:00 UTC

[jira] [Work logged] (HIVE-23939) SharedWorkOptimizer: take the union of columns in mergeable TableScans

     [ https://issues.apache.org/jira/browse/HIVE-23939?focusedWorklogId=464314&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464314 ]

ASF GitHub Bot logged work on HIVE-23939:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Jul/20 07:28
            Start Date: 30/Jul/20 07:28
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-666155487


   @HunterL
   Thanks for reviewing this patch.
   The expression
   ```
   filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or s_store_sk is not null)
   ```
   is a result of merging two `TableScanOperator`s. Both of them are scanning the same table: `alias: s` but they had different `filterExpr`.
   TS1: ((s_floor_space > 1000) and s_store_sk is not null)
   TS2: s_store_sk is not null
   
   SharedWorkOptimizer naively combined the filter expressions using `or` because we need the union of the records produced by both TS. You are right in this particular case the filter expression could be reduced to `s_store_sk is not null`
   
   The new TS has two children
   ```
   TableScan
     alias: s
     filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or s_store_sk is not null) (type: boolean)
     Filter Operator
       predicate: ((s_floor_space > 1000) and s_store_sk is not null) (type: boolean)
       ...
     Filter Operator
       predicate: s_store_sk is not null (type: boolean)
       ...
   ```
   both of them are `Filter operators` which are the root of subtrees to broadcast the proper subset of records to each reducer edge (Reducer 2 and Reducer 3)
   
   If `and` were used for combining the filter expressions of TS operators the branch which does not have the filter `s_floor_space > 1000` would loose a subset of records.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 464314)
    Remaining Estimate: 0h
            Time Spent: 10m

> SharedWorkOptimizer: take the union of columns in mergeable TableScans
> ----------------------------------------------------------------------
>
>                 Key: HIVE-23939
>                 URL: https://issues.apache.org/jira/browse/HIVE-23939
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> POSTHOOK: query: explain
> select case when (select count(*) 
>                   from store_sales 
>                   where ss_quantity between 1 and 20) > 409437
>             then (select avg(ss_ext_list_price) 
>                   from store_sales 
>                   where ss_quantity between 1 and 20) 
>             else (select avg(ss_net_paid_inc_tax)
>                   from store_sales
>                   where ss_quantity between 1 and 20) end bucket1 ,
>        case when (select count(*)
>                   from store_sales
>                   where ss_quantity between 21 and 40) > 4595804
>             then (select avg(ss_ext_list_price)
>                   from store_sales
>                   where ss_quantity between 21 and 40) 
>             else (select avg(ss_net_paid_inc_tax)
>                   from store_sales
>                   where ss_quantity between 21 and 40) end bucket2,
>        case when (select count(*)
>                   from store_sales
>                   where ss_quantity between 41 and 60) > 7887297
>             then (select avg(ss_ext_list_price)
>                   from store_sales
>                   where ss_quantity between 41 and 60)
>             else (select avg(ss_net_paid_inc_tax)
>                   from store_sales
>                   where ss_quantity between 41 and 60) end bucket3,
>        case when (select count(*)
>                   from store_sales
>                   where ss_quantity between 61 and 80) > 10872978
>             then (select avg(ss_ext_list_price)
>                   from store_sales
>                   where ss_quantity between 61 and 80)
>             else (select avg(ss_net_paid_inc_tax)
>                   from store_sales
>                   where ss_quantity between 61 and 80) end bucket4,
>        case when (select count(*)
>                   from store_sales
>                   where ss_quantity between 81 and 100) > 43571537
>             then (select avg(ss_ext_list_price)
>                   from store_sales
>                   where ss_quantity between 81 and 100)
>             else (select avg(ss_net_paid_inc_tax)
>                   from store_sales
>                   where ss_quantity between 81 and 100) end bucket5
> from reason
> where r_reason_sk = 1
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@reason
> POSTHOOK: Input: default@store_sales
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Reducer 34 (CUSTOM_SIMPLE_EDGE), Reducer 9 (CUSTOM_SIMPLE_EDGE)
> Reducer 11 <- Reducer 10 (CUSTOM_SIMPLE_EDGE), Reducer 18 (CUSTOM_SIMPLE_EDGE)
> Reducer 12 <- Reducer 11 (CUSTOM_SIMPLE_EDGE), Reducer 24 (CUSTOM_SIMPLE_EDGE)
> Reducer 13 <- Reducer 12 (CUSTOM_SIMPLE_EDGE), Reducer 30 (CUSTOM_SIMPLE_EDGE)
> Reducer 14 <- Reducer 13 (CUSTOM_SIMPLE_EDGE), Reducer 19 (CUSTOM_SIMPLE_EDGE)
> Reducer 15 <- Reducer 14 (CUSTOM_SIMPLE_EDGE), Reducer 25 (CUSTOM_SIMPLE_EDGE)
> Reducer 16 <- Reducer 15 (CUSTOM_SIMPLE_EDGE), Reducer 31 (CUSTOM_SIMPLE_EDGE)
> Reducer 18 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 19 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Reducer 20 (CUSTOM_SIMPLE_EDGE)
> Reducer 20 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 21 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 22 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 24 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 25 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 26 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 27 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 28 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE), Reducer 26 (CUSTOM_SIMPLE_EDGE)
> Reducer 30 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 31 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 32 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 33 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 34 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE), Reducer 32 (CUSTOM_SIMPLE_EDGE)
> Reducer 5 <- Reducer 21 (CUSTOM_SIMPLE_EDGE), Reducer 4 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Reducer 27 (CUSTOM_SIMPLE_EDGE), Reducer 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Reducer 33 (CUSTOM_SIMPLE_EDGE), Reducer 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 22 (CUSTOM_SIMPLE_EDGE), Reducer 7 (CUSTOM_SIMPLE_EDGE)
> Reducer 9 <- Reducer 28 (CUSTOM_SIMPLE_EDGE), Reducer 8 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Reducer 16
>       File Output Operator [FS_154]
>         Select Operator [SEL_153] (rows=2 width=560)
>           Output:["_col0","_col1","_col2","_col3","_col4"]
>           Merge Join Operator [MERGEJOIN_185] (rows=2 width=1140)
>             Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15"]
>           <-Reducer 15 [CUSTOM_SIMPLE_EDGE]
>             PARTITION_ONLY_SHUFFLE [RS_150]
>               Merge Join Operator [MERGEJOIN_184] (rows=2 width=1028)
>                 Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
>               <-Reducer 14 [CUSTOM_SIMPLE_EDGE]
>                 PARTITION_ONLY_SHUFFLE [RS_147]
>                   Merge Join Operator [MERGEJOIN_183] (rows=2 width=916)
>                     Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
>                   <-Reducer 13 [CUSTOM_SIMPLE_EDGE]
>                     PARTITION_ONLY_SHUFFLE [RS_144]
>                       Merge Join Operator [MERGEJOIN_182] (rows=2 width=912)
>                         Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12"]
>                       <-Reducer 12 [CUSTOM_SIMPLE_EDGE]
>                         PARTITION_ONLY_SHUFFLE [RS_141]
>                           Merge Join Operator [MERGEJOIN_181] (rows=2 width=800)
>                             Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
>                           <-Reducer 11 [CUSTOM_SIMPLE_EDGE]
>                             PARTITION_ONLY_SHUFFLE [RS_138]
>                               Merge Join Operator [MERGEJOIN_180] (rows=2 width=688)
>                                 Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
>                               <-Reducer 10 [CUSTOM_SIMPLE_EDGE]
>                                 PARTITION_ONLY_SHUFFLE [RS_135]
>                                   Merge Join Operator [MERGEJOIN_179] (rows=2 width=684)
>                                     Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
>                                   <-Reducer 34 [CUSTOM_SIMPLE_EDGE] vectorized
>                                     PARTITION_ONLY_SHUFFLE [RS_275]
>                                       Select Operator [SEL_274] (rows=1 width=112)
>                                         Output:["_col0"]
>                                         Group By Operator [GBY_273] (rows=1 width=120)
>                                           Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                         <-Map 29 [CUSTOM_SIMPLE_EDGE] vectorized
>                                           PARTITION_ONLY_SHUFFLE [RS_254]
>                                             Group By Operator [GBY_249] (rows=1 width=120)
>                                               Output:["_col0","_col1"],aggregations:["sum(ss_net_paid_inc_tax)","count(ss_net_paid_inc_tax)"]
>                                               Select Operator [SEL_244] (rows=182855757 width=110)
>                                                 Output:["ss_net_paid_inc_tax"]
>                                                 Filter Operator [FIL_239] (rows=182855757 width=110)
>                                                   predicate:ss_quantity BETWEEN 41 AND 60
>                                                   TableScan [TS_80] (rows=575995635 width=110)
>                                                     default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity","ss_net_paid_inc_tax"]
>                                   <-Reducer 9 [CUSTOM_SIMPLE_EDGE]
>                                     PARTITION_ONLY_SHUFFLE [RS_132]
>                                       Merge Join Operator [MERGEJOIN_178] (rows=2 width=572)
>                                         Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>                                       <-Reducer 28 [CUSTOM_SIMPLE_EDGE] vectorized
>                                         PARTITION_ONLY_SHUFFLE [RS_272]
>                                           Select Operator [SEL_271] (rows=1 width=112)
>                                             Output:["_col0"]
>                                             Group By Operator [GBY_270] (rows=1 width=120)
>                                               Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                             <-Map 23 [CUSTOM_SIMPLE_EDGE] vectorized
>                                               PARTITION_ONLY_SHUFFLE [RS_231]
>                                                 Group By Operator [GBY_226] (rows=1 width=120)
>                                                   Output:["_col0","_col1"],aggregations:["sum(ss_ext_list_price)","count(ss_ext_list_price)"]
>                                                   Select Operator [SEL_221] (rows=182855757 width=110)
>                                                     Output:["ss_ext_list_price"]
>                                                     Filter Operator [FIL_216] (rows=182855757 width=110)
>                                                       predicate:ss_quantity BETWEEN 41 AND 60
>                                                       TableScan [TS_73] (rows=575995635 width=110)
>                                                         default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity","ss_ext_list_price"]
>                                       <-Reducer 8 [CUSTOM_SIMPLE_EDGE]
>                                         PARTITION_ONLY_SHUFFLE [RS_129]
>                                           Merge Join Operator [MERGEJOIN_177] (rows=2 width=460)
>                                             Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7"]
>                                           <-Reducer 22 [CUSTOM_SIMPLE_EDGE] vectorized
>                                             PARTITION_ONLY_SHUFFLE [RS_269]
>                                               Select Operator [SEL_268] (rows=1 width=4)
>                                                 Output:["_col0"]
>                                                 Group By Operator [GBY_267] (rows=1 width=8)
>                                                   Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                                                 <-Map 17 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                   PARTITION_ONLY_SHUFFLE [RS_208]
>                                                     Group By Operator [GBY_203] (rows=1 width=8)
>                                                       Output:["_col0"],aggregations:["count()"]
>                                                       Select Operator [SEL_198] (rows=182855757 width=3)
>                                                         Filter Operator [FIL_193] (rows=182855757 width=3)
>                                                           predicate:ss_quantity BETWEEN 41 AND 60
>                                                           TableScan [TS_66] (rows=575995635 width=3)
>                                                             default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity"]
>                                           <-Reducer 7 [CUSTOM_SIMPLE_EDGE]
>                                             PARTITION_ONLY_SHUFFLE [RS_126]
>                                               Merge Join Operator [MERGEJOIN_176] (rows=2 width=456)
>                                                 Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5","_col6"]
>                                               <-Reducer 33 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                 PARTITION_ONLY_SHUFFLE [RS_266]
>                                                   Select Operator [SEL_265] (rows=1 width=112)
>                                                     Output:["_col0"]
>                                                     Group By Operator [GBY_264] (rows=1 width=120)
>                                                       Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                                     <-Map 29 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                       PARTITION_ONLY_SHUFFLE [RS_253]
>                                                         Group By Operator [GBY_248] (rows=1 width=120)
>                                                           Output:["_col0","_col1"],aggregations:["sum(ss_net_paid_inc_tax)","count(ss_net_paid_inc_tax)"]
>                                                           Select Operator [SEL_243] (rows=182855757 width=110)
>                                                             Output:["ss_net_paid_inc_tax"]
>                                                             Filter Operator [FIL_238] (rows=182855757 width=110)
>                                                               predicate:ss_quantity BETWEEN 21 AND 40
>                                                                Please refer to the previous TableScan [TS_80]
>                                               <-Reducer 6 [CUSTOM_SIMPLE_EDGE]
>                                                 PARTITION_ONLY_SHUFFLE [RS_123]
>                                                   Merge Join Operator [MERGEJOIN_175] (rows=2 width=344)
>                                                     Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4","_col5"]
>                                                   <-Reducer 27 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                     PARTITION_ONLY_SHUFFLE [RS_263]
>                                                       Select Operator [SEL_262] (rows=1 width=112)
>                                                         Output:["_col0"]
>                                                         Group By Operator [GBY_261] (rows=1 width=120)
>                                                           Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                                         <-Map 23 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                           PARTITION_ONLY_SHUFFLE [RS_230]
>                                                             Group By Operator [GBY_225] (rows=1 width=120)
>                                                               Output:["_col0","_col1"],aggregations:["sum(ss_ext_list_price)","count(ss_ext_list_price)"]
>                                                               Select Operator [SEL_220] (rows=182855757 width=110)
>                                                                 Output:["ss_ext_list_price"]
>                                                                 Filter Operator [FIL_215] (rows=182855757 width=110)
>                                                                   predicate:ss_quantity BETWEEN 21 AND 40
>                                                                    Please refer to the previous TableScan [TS_73]
>                                                   <-Reducer 5 [CUSTOM_SIMPLE_EDGE]
>                                                     PARTITION_ONLY_SHUFFLE [RS_120]
>                                                       Merge Join Operator [MERGEJOIN_174] (rows=2 width=232)
>                                                         Conds:(Left Outer),Output:["_col1","_col2","_col3","_col4"]
>                                                       <-Reducer 21 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                         PARTITION_ONLY_SHUFFLE [RS_260]
>                                                           Select Operator [SEL_259] (rows=1 width=4)
>                                                             Output:["_col0"]
>                                                             Group By Operator [GBY_258] (rows=1 width=8)
>                                                               Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                                                             <-Map 17 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                               PARTITION_ONLY_SHUFFLE [RS_207]
>                                                                 Group By Operator [GBY_202] (rows=1 width=8)
>                                                                   Output:["_col0"],aggregations:["count()"]
>                                                                   Select Operator [SEL_197] (rows=182855757 width=3)
>                                                                     Filter Operator [FIL_192] (rows=182855757 width=3)
>                                                                       predicate:ss_quantity BETWEEN 21 AND 40
>                                                                        Please refer to the previous TableScan [TS_66]
>                                                       <-Reducer 4 [CUSTOM_SIMPLE_EDGE]
>                                                         PARTITION_ONLY_SHUFFLE [RS_117]
>                                                           Merge Join Operator [MERGEJOIN_173] (rows=2 width=228)
>                                                             Conds:(Left Outer),Output:["_col1","_col2","_col3"]
>                                                           <-Reducer 3 [CUSTOM_SIMPLE_EDGE]
>                                                             PARTITION_ONLY_SHUFFLE [RS_114]
>                                                               Merge Join Operator [MERGEJOIN_172] (rows=2 width=116)
>                                                                 Conds:(Left Outer),Output:["_col1","_col2"]
>                                                               <-Reducer 2 [CUSTOM_SIMPLE_EDGE]
>                                                                 PARTITION_ONLY_SHUFFLE [RS_111]
>                                                                   Merge Join Operator [MERGEJOIN_171] (rows=2 width=4)
>                                                                     Conds:(Left Outer),Output:["_col1"]
>                                                                   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                     PARTITION_ONLY_SHUFFLE [RS_188]
>                                                                       Select Operator [SEL_187] (rows=2 width=4)
>                                                                         Filter Operator [FIL_186] (rows=2 width=4)
>                                                                           predicate:(r_reason_sk = 1)
>                                                                           TableScan [TS_0] (rows=72 width=4)
>                                                                             default@reason,reason,Tbl:COMPLETE,Col:COMPLETE,Output:["r_reason_sk"]
>                                                                   <-Reducer 20 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                     PARTITION_ONLY_SHUFFLE [RS_211]
>                                                                       Select Operator [SEL_210] (rows=1 width=4)
>                                                                         Output:["_col0"]
>                                                                         Group By Operator [GBY_209] (rows=1 width=8)
>                                                                           Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                                                                         <-Map 17 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                           PARTITION_ONLY_SHUFFLE [RS_206]
>                                                                             Group By Operator [GBY_201] (rows=1 width=8)
>                                                                               Output:["_col0"],aggregations:["count()"]
>                                                                               Select Operator [SEL_196] (rows=182855757 width=3)
>                                                                                 Filter Operator [FIL_191] (rows=182855757 width=3)
>                                                                                   predicate:ss_quantity BETWEEN 1 AND 20
>                                                                                    Please refer to the previous TableScan [TS_66]
>                                                               <-Reducer 26 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                 PARTITION_ONLY_SHUFFLE [RS_234]
>                                                                   Select Operator [SEL_233] (rows=1 width=112)
>                                                                     Output:["_col0"]
>                                                                     Group By Operator [GBY_232] (rows=1 width=120)
>                                                                       Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                                                     <-Map 23 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                       PARTITION_ONLY_SHUFFLE [RS_229]
>                                                                         Group By Operator [GBY_224] (rows=1 width=120)
>                                                                           Output:["_col0","_col1"],aggregations:["sum(ss_ext_list_price)","count(ss_ext_list_price)"]
>                                                                           Select Operator [SEL_219] (rows=182855757 width=110)
>                                                                             Output:["ss_ext_list_price"]
>                                                                             Filter Operator [FIL_214] (rows=182855757 width=110)
>                                                                               predicate:ss_quantity BETWEEN 1 AND 20
>                                                                                Please refer to the previous TableScan [TS_73]
>                                                           <-Reducer 32 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                             PARTITION_ONLY_SHUFFLE [RS_257]
>                                                               Select Operator [SEL_256] (rows=1 width=112)
>                                                                 Output:["_col0"]
>                                                                 Group By Operator [GBY_255] (rows=1 width=120)
>                                                                   Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                                                 <-Map 29 [CUSTOM_SIMPLE_EDGE] vectorized
>                                                                   PARTITION_ONLY_SHUFFLE [RS_252]
>                                                                     Group By Operator [GBY_247] (rows=1 width=120)
>                                                                       Output:["_col0","_col1"],aggregations:["sum(ss_net_paid_inc_tax)","count(ss_net_paid_inc_tax)"]
>                                                                       Select Operator [SEL_242] (rows=182855757 width=110)
>                                                                         Output:["ss_net_paid_inc_tax"]
>                                                                         Filter Operator [FIL_237] (rows=182855757 width=110)
>                                                                           predicate:ss_quantity BETWEEN 1 AND 20
>                                                                            Please refer to the previous TableScan [TS_80]
>                               <-Reducer 18 [CUSTOM_SIMPLE_EDGE] vectorized
>                                 PARTITION_ONLY_SHUFFLE [RS_278]
>                                   Select Operator [SEL_277] (rows=1 width=4)
>                                     Output:["_col0"]
>                                     Group By Operator [GBY_276] (rows=1 width=8)
>                                       Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                                     <-Map 17 [CUSTOM_SIMPLE_EDGE] vectorized
>                                       PARTITION_ONLY_SHUFFLE [RS_204]
>                                         Group By Operator [GBY_199] (rows=1 width=8)
>                                           Output:["_col0"],aggregations:["count()"]
>                                           Select Operator [SEL_194] (rows=182855757 width=3)
>                                             Filter Operator [FIL_189] (rows=182855757 width=3)
>                                               predicate:ss_quantity BETWEEN 61 AND 80
>                                                Please refer to the previous TableScan [TS_66]
>                           <-Reducer 24 [CUSTOM_SIMPLE_EDGE] vectorized
>                             PARTITION_ONLY_SHUFFLE [RS_281]
>                               Select Operator [SEL_280] (rows=1 width=112)
>                                 Output:["_col0"]
>                                 Group By Operator [GBY_279] (rows=1 width=120)
>                                   Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                                 <-Map 23 [CUSTOM_SIMPLE_EDGE] vectorized
>                                   PARTITION_ONLY_SHUFFLE [RS_227]
>                                     Group By Operator [GBY_222] (rows=1 width=120)
>                                       Output:["_col0","_col1"],aggregations:["sum(ss_ext_list_price)","count(ss_ext_list_price)"]
>                                       Select Operator [SEL_217] (rows=182855757 width=110)
>                                         Output:["ss_ext_list_price"]
>                                         Filter Operator [FIL_212] (rows=182855757 width=110)
>                                           predicate:ss_quantity BETWEEN 61 AND 80
>                                            Please refer to the previous TableScan [TS_73]
>                       <-Reducer 30 [CUSTOM_SIMPLE_EDGE] vectorized
>                         PARTITION_ONLY_SHUFFLE [RS_284]
>                           Select Operator [SEL_283] (rows=1 width=112)
>                             Output:["_col0"]
>                             Group By Operator [GBY_282] (rows=1 width=120)
>                               Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                             <-Map 29 [CUSTOM_SIMPLE_EDGE] vectorized
>                               PARTITION_ONLY_SHUFFLE [RS_250]
>                                 Group By Operator [GBY_245] (rows=1 width=120)
>                                   Output:["_col0","_col1"],aggregations:["sum(ss_net_paid_inc_tax)","count(ss_net_paid_inc_tax)"]
>                                   Select Operator [SEL_240] (rows=182855757 width=110)
>                                     Output:["ss_net_paid_inc_tax"]
>                                     Filter Operator [FIL_235] (rows=182855757 width=110)
>                                       predicate:ss_quantity BETWEEN 61 AND 80
>                                        Please refer to the previous TableScan [TS_80]
>                   <-Reducer 19 [CUSTOM_SIMPLE_EDGE] vectorized
>                     PARTITION_ONLY_SHUFFLE [RS_287]
>                       Select Operator [SEL_286] (rows=1 width=4)
>                         Output:["_col0"]
>                         Group By Operator [GBY_285] (rows=1 width=8)
>                           Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                         <-Map 17 [CUSTOM_SIMPLE_EDGE] vectorized
>                           PARTITION_ONLY_SHUFFLE [RS_205]
>                             Group By Operator [GBY_200] (rows=1 width=8)
>                               Output:["_col0"],aggregations:["count()"]
>                               Select Operator [SEL_195] (rows=182855757 width=3)
>                                 Filter Operator [FIL_190] (rows=182855757 width=3)
>                                   predicate:ss_quantity BETWEEN 81 AND 100
>                                    Please refer to the previous TableScan [TS_66]
>               <-Reducer 25 [CUSTOM_SIMPLE_EDGE] vectorized
>                 PARTITION_ONLY_SHUFFLE [RS_290]
>                   Select Operator [SEL_289] (rows=1 width=112)
>                     Output:["_col0"]
>                     Group By Operator [GBY_288] (rows=1 width=120)
>                       Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                     <-Map 23 [CUSTOM_SIMPLE_EDGE] vectorized
>                       PARTITION_ONLY_SHUFFLE [RS_228]
>                         Group By Operator [GBY_223] (rows=1 width=120)
>                           Output:["_col0","_col1"],aggregations:["sum(ss_ext_list_price)","count(ss_ext_list_price)"]
>                           Select Operator [SEL_218] (rows=182855757 width=110)
>                             Output:["ss_ext_list_price"]
>                             Filter Operator [FIL_213] (rows=182855757 width=110)
>                               predicate:ss_quantity BETWEEN 81 AND 100
>                                Please refer to the previous TableScan [TS_73]
>           <-Reducer 31 [CUSTOM_SIMPLE_EDGE] vectorized
>             PARTITION_ONLY_SHUFFLE [RS_293]
>               Select Operator [SEL_292] (rows=1 width=112)
>                 Output:["_col0"]
>                 Group By Operator [GBY_291] (rows=1 width=120)
>                   Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
>                 <-Map 29 [CUSTOM_SIMPLE_EDGE] vectorized
>                   PARTITION_ONLY_SHUFFLE [RS_251]
>                     Group By Operator [GBY_246] (rows=1 width=120)
>                       Output:["_col0","_col1"],aggregations:["sum(ss_net_paid_inc_tax)","count(ss_net_paid_inc_tax)"]
>                       Select Operator [SEL_241] (rows=182855757 width=110)
>                         Output:["ss_net_paid_inc_tax"]
>                         Filter Operator [FIL_236] (rows=182855757 width=110)
>                           predicate:ss_quantity BETWEEN 81 AND 100
>                            Please refer to the previous TableScan [TS_80]
> {code}
> {code}
> TableScan [TS_80] (rows=575995635 width=110)
> default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity","ss_net_paid_inc_tax"]
> {code}
> {code}
> TableScan [TS_73] (rows=575995635 width=110)default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity","ss_ext_list_price"]
> {code}
> {code}
> TableScan [TS_66] (rows=575995635 width=3)
> default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_quantity"]
> {code}
> Table *store_sales* read by three TableScans. The difference between then is the projected columns.
> The goal of this patch is to merge those TableScan operators and project the columns from all three original TS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)