You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/02/04 03:40:33 UTC

[GitHub] [incubator-doris] EmmyMiao87 opened a new issue #2830: Support subquery in disjunction

EmmyMiao87 opened a new issue #2830: Support subquery in disjunction
URL: https://github.com/apache/incubator-doris/issues/2830
 
 
   **Is your feature request related to a problem? Please describe.**
   Doris does not support the subquery in disjunction.
   For example:
   ```
   SELECT *
   FROM customer c
   WHERE   EXISTS
           (
                 SELECT *
                 FROM web_sales
                 WHERE c.c_customer_sk = ws_bill_customer_sk
           )
           OR EXISTS
           (
                 SELECT *
                 FROM catalog_sales
                 WHERE c.c_customer_sk = cs_ship_customer_sk
   
           );
   ```
   ```
   SELECT *
   FROM  customer_address, item
   WHERE
       substr(ca_zip, 1, 5)
   
       IN
   
       (
            '85669'
       )
       OR i_item_id
   
       IN
   
       (
             SELECT i_item_id
             FROM item
             WHERE i_item_sk IN (2)
       );
   ```
   Subqueries in OR predicates are not supported
   
   TPC-DS 10,35,45 query:
   ```
   query10:
   SELECT cd_gender,
         cd_marital_status,
         cd_education_status,
         count(*) cnt1,
         cd_purchase_estimate,
         count(*) cnt2,
         cd_credit_rating,
         count(*) cnt3,
         cd_dep_count,
         count(*) cnt4,
         cd_dep_employed_count,
         count(*) cnt5,
         cd_dep_college_count,
         count(*) cnt6
   FROM customer c,customer_address ca,customer_demographics
   WHERE c.c_current_addr_sk = ca.ca_address_sk
         AND ca_county IN ('Walker County','Richland County','Gaines County','Douglas County','Dona Ana County')
         AND cd_demo_sk = c.c_current_cdemo_sk
         AND EXISTS
         (
   
               SELECT *
               FROM store_sales,date_dim
               WHERE c.c_customer_sk = ss_customer_sk
                     AND ss_sold_date_sk = d_date_sk
                     AND d_year = 2002
                    AND d_moy
                     BETWEEN 4
                     AND 4+3
   
         )
         AND
   
         (
   
               EXISTS
               (
   
                     SELECT *
                     FROM web_sales,date_dim
                     WHERE c.c_customer_sk = ws_bill_customer_sk
                           AND ws_sold_date_sk = d_date_sk
                           AND d_year = 2002
                           AND d_moy
                     BETWEEN 4
                     AND 4+3
   
               )
               OR EXISTS
               (
   
                     SELECT *
                     FROM catalog_sales,date_dim
                     WHERE c.c_customer_sk = cs_ship_customer_sk
                           AND cs_sold_date_sk = d_date_sk
                           AND d_year = 2002
                           AND d_moy
                           BETWEEN 4
                           AND 4+3
   
               )
   
         )
   GROUP BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count
   ORDER BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count
   
   LIMIT 100;
   ```
   
   ```
   query 35:
   SELECT ca_state,
         cd_gender,
         cd_marital_status,
         cd_dep_count,
         count(*) cnt1,
         avg(cd_dep_count),
         max(cd_dep_count),
         sum(cd_dep_count),
         cd_dep_employed_count,
         count(*) cnt2,
         avg(cd_dep_employed_count),
         max(cd_dep_employed_count),
         sum(cd_dep_employed_count),
         cd_dep_college_count,
         count(*) cnt3,
         avg(cd_dep_college_count),
         max(cd_dep_college_count),
         sum(cd_dep_college_count)
   FROM customer c,customer_address ca,customer_demographics
   WHERE c.c_current_addr_sk = ca.ca_address_sk
         AND cd_demo_sk = c.c_current_cdemo_sk
         AND EXISTS
         (
   
               SELECT *
               FROM store_sales,date_dim
               WHERE c.c_customer_sk = ss_customer_sk
                     AND ss_sold_date_sk = d_date_sk
                     AND d_year = 1999
                     AND d_qoy < 4
   
         )
         AND
   
         (
   
               EXISTS
               (
   
                     SELECT *
                     FROM web_sales,date_dim
                     WHERE c.c_customer_sk = ws_bill_customer_sk
                           AND ws_sold_date_sk = d_date_sk
                           AND d_year = 1999
                           AND d_qoy < 4
   
               )
               OR EXISTS
               (
   
                     SELECT *
                     FROM catalog_sales,date_dim
                     WHERE c.c_customer_sk = cs_ship_customer_sk
                           AND cs_sold_date_sk = d_date_sk
                           AND d_year = 1999
                           AND d_qoy < 4
   
               )
   
         )
   GROUP BY ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count
   ORDER BY ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count
   
   LIMIT 100;
   ```
   
   ```
   Query 45:
   SELECT ca_zip, ca_county, SUM(ws_sales_price)
   FROM web_sales, customer, customer_address, date_dim, item
   WHERE
   
         (
   
               ws_bill_customer_sk = c_customer_sk
              AND c_current_addr_sk = ca_address_sk
              AND ws_item_sk = i_item_sk
              AND
   
              (
   
                     substr(ca_zip, 1, 5)
   
                     IN
   
                     (
                          '85669',
                          '86197',
                          '88274',
                          '83405',
                           '86475',
                           '85392',
                           '85460',
                          '80348',
                           '81792'
                    )
                    OR i_item_id
   
                     IN
   
                     (
                           SELECT i_item_id
                           FROM item
                           WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
                    )
   
               )
               AND ws_sold_date_sk = d_date_sk
              AND d_qoy = 2
              AND d_year = 2000
   
         )
   GROUP BY ca_zip, ca_county
   ORDER BY ca_zip, ca_county
   LIMIT 100;
   ```
   
   **Describe the solution you'd like**
   The subquery could be supported inn disjunction.
   Select * from t1 where a in subquery1 or a b in subquery2.
   
   **Competitive analysis**
   Greenplum:
   Impala:
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction
URL: https://github.com/apache/incubator-doris/issues/2830#issuecomment-581809110
 
 
   Greenplum support the correlated subquery in disjunctive(OR) filter by GPORCA.
   
   If the correlated subquery in conjunctive filter, the query will be rewritten to semi join.
   If the correlated subquery in disjunctive filter, the query will be rewritten to left outer join in Greenplum.
   For example:
   select * from a where exist (select * from b where a.k1=b.k1) or exist (select * from c where a.k1=c.k1);
   1. a left outer join b on a.k1=b.k1
   2. the result of 1 left outer join c on a.k1=c.k1
   the result of 2 like that
   
   | a.k1 | b.k1 | c.k1 |
   | :-----| :---- | :----|
   | 1 | 1 | NULL |
   | 2 | NULL | 2 |
   | 3 | 3 | 3|
   | 4 | NULL | NULL|
   3. filter the incorrect line
   The first, second and third line should be keep.
   4. keep the outer table columns
   keep the columns in table A
   Here is a question. What does the following mean?
   `((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))`
   
   For query 
   ```
   SELECT *
   FROM customer c
   WHERE   EXISTS
           (
                 SELECT *
                 FROM web_sales
                 WHERE c.c_customer_sk = ws_bill_customer_sk
           )
           OR EXISTS
           (
                 SELECT *
                 FROM catalog_sales
                 WHERE c.c_customer_sk = cs_ship_customer_sk
   
           );
   ```
   The explain result like that 
   ```
   tpcds=# explain SELECT *
   tpcds-# FROM customer c
   tpcds-# WHERE   EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM web_sales
   tpcds(#               WHERE c.c_customer_sk = ws_bill_customer_sk
   tpcds(#         )
   tpcds-#         OR EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM catalog_sales
   tpcds(#               WHERE c.c_customer_sk = cs_ship_customer_sk
   tpcds(#
   tpcds(#         );
                                                                               QUERY PLAN
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Gather Motion 10:1  (slice3; segments: 10)  (cost=0.00..60357.42 rows=13500000 width=113)
      ->  Result  (cost=0.00..56302.64 rows=1350000 width=113)
            Filter: ((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))
            ->  Result  (cost=0.00..56118.17 rows=2803591 width=120)
                  ->  Hash Left Join  (cost=0.00..55781.73 rows=2803591 width=129)
                        Hash Cond: (customer.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                        ->  Hash Left Join  (cost=0.00..19740.78 rows=1613115 width=121)
                              Hash Cond: (customer.c_customer_sk = web_sales.ws_bill_customer_sk)
                              ->  Seq Scan on customer  (cost=0.00..514.82 rows=1200000 width=113)
                              ->  Hash  (cost=17560.85..17560.85 rows=1177761 width=12)
                                    ->  HashAggregate  (cost=0.00..17560.85 rows=1177761 width=12)
                                          Group Key: web_sales.ws_bill_customer_sk
                                          ->  Redistribute Motion 10:10  (slice1; segments: 10)  (cost=0.00..17410.01 rows=1177761 width=12)
                                                Hash Key: web_sales.ws_bill_customer_sk
                                                ->  Result  (cost=0.00..17365.77 rows=1177761 width=12)
                                                      ->  HashAggregate  (cost=0.00..17365.77 rows=1177761 width=12)
                                                            Group Key: web_sales.ws_bill_customer_sk
                                                            ->  Sequence  (cost=0.00..7559.00 rows=72000038 width=4)
                                                                  ->  Partition Selector for web_sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=10 width=4)
                                                                        Partitions selected: 56 (out of 56)
                                                                  ->  Dynamic Seq Scan on web_sales (dynamic scan id: 1)  (cost=0.00..7559.00 rows=72000038 width=4)
                        ->  Hash  (cost=33936.95..33936.95 rows=1178712 width=12)
                              ->  HashAggregate  (cost=0.00..33936.95 rows=1178712 width=12)
                                    Group Key: catalog_sales.cs_ship_customer_sk
                                    ->  Redistribute Motion 10:10  (slice2; segments: 10)  (cost=0.00..33785.99 rows=1178712 width=12)
                                          Hash Key: catalog_sales.cs_ship_customer_sk
                                          ->  Result  (cost=0.00..33741.71 rows=1178712 width=12)
                                                ->  HashAggregate  (cost=0.00..33741.71 rows=1178712 width=12)
                                                      Group Key: catalog_sales.cs_ship_customer_sk
                                                      ->  Sequence  (cost=0.00..14132.41 rows=143998042 width=4)
                                                            ->  Partition Selector for catalog_sales (dynamic scan id: 2)  (cost=10.00..100.00 rows=10 width=4)
                                                                  Partitions selected: 80 (out of 80)
                                                            ->  Dynamic Seq Scan on catalog_sales (dynamic scan id: 2)  (cost=0.00..14132.41 rows=143998042 width=4)
    Optimizer: Pivotal Optimizer (GPORCA) version 3.80.0
   (34 rows)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction
URL: https://github.com/apache/incubator-doris/issues/2830#issuecomment-581809110
 
 
   Greenplum support the correlated subquery in disjunctive(OR) filter by GPORCA.
   
   If the correlated subquery in conjunctive filter, the query will be rewritten to semi join.
   If the correlated subquery in disjunctive filter, the query will be rewritten to left outer join in Greenplum.
   For example:
   select * from a where exist (select * from b where a.k1=b.k1) or exist (select * from c where a.k1=c.k1);
   1. a left outer join b on a.k1=b.k1
   2. the result of 1 left outer join c on a.k1=c.k1
   the result of 2 like that
   
   | a.k1 | b.k1 | c.k1 |
   | :-----| :---- | :----|
   | 1 | 1 | NULL |
   | 2 | NULL | 2 |
   | 3 | 3 | 3|
   | 4 | NULL | NULL|
   3. filter the incorrect line
   The first, second and third line should be keep.
   4. keep the outer table columns
   keep the columns in table A
   Here is a question. What does the following mean?
   `((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))`
   
   For query 
   ```
   SELECT *
   FROM customer c
   WHERE   EXISTS
           (
                 SELECT *
                 FROM web_sales
                 WHERE c.c_customer_sk = ws_bill_customer_sk
           )
           OR EXISTS
           (
                 SELECT *
                 FROM catalog_sales
                 WHERE c.c_customer_sk = cs_ship_customer_sk
   
           );
   ```
   The explain result like that 
   ```
   tpcds=# explain SELECT *
   tpcds-# FROM customer c
   tpcds-# WHERE   EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM web_sales
   tpcds(#               WHERE c.c_customer_sk = ws_bill_customer_sk
   tpcds(#         )
   tpcds-#         OR EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM catalog_sales
   tpcds(#               WHERE c.c_customer_sk = cs_ship_customer_sk
   tpcds(#
   tpcds(#         );
                                                                               QUERY PLAN
   ---------------------------------------------------------------------------------------------------
    Gather Motion 10:1  (slice3; segments: 10)  (cost=0.00..60357.42 rows=13500000 width=113)
      ->  Result  (cost=0.00..56302.64 rows=1350000 width=113)
            Filter: ((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))
            ->  Result  (cost=0.00..56118.17 rows=2803591 width=120)
                  ->  Hash Left Join  (cost=0.00..55781.73 rows=2803591 width=129)
                        Hash Cond: (customer.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                        ->  Hash Left Join  (cost=0.00..19740.78 rows=1613115 width=121)
                              Hash Cond: (customer.c_customer_sk = web_sales.ws_bill_customer_sk)
                              ->  Seq Scan on customer  (cost=0.00..514.82 rows=1200000 width=113)
                              ->  Hash  (cost=17560.85..17560.85 rows=1177761 width=12)
                                    ->  HashAggregate  (cost=0.00..17560.85 rows=1177761 width=12)
                                          Group Key: web_sales.ws_bill_customer_sk
                                          ->  Redistribute Motion 10:10  (slice1; segments: 10)  (cost=0.00..17410.01 rows=1177761 width=12)
                                                Hash Key: web_sales.ws_bill_customer_sk
                                                ->  Result  (cost=0.00..17365.77 rows=1177761 width=12)
                                                      ->  HashAggregate  (cost=0.00..17365.77 rows=1177761 width=12)
                                                            Group Key: web_sales.ws_bill_customer_sk
                                                            ->  Sequence  (cost=0.00..7559.00 rows=72000038 width=4)
                                                                  ->  Partition Selector for web_sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=10 width=4)
                                                                        Partitions selected: 56 (out of 56)
                                                                  ->  Dynamic Seq Scan on web_sales (dynamic scan id: 1)  (cost=0.00..7559.00 rows=72000038 width=4)
                        ->  Hash  (cost=33936.95..33936.95 rows=1178712 width=12)
                              ->  HashAggregate  (cost=0.00..33936.95 rows=1178712 width=12)
                                    Group Key: catalog_sales.cs_ship_customer_sk
                                    ->  Redistribute Motion 10:10  (slice2; segments: 10)  (cost=0.00..33785.99 rows=1178712 width=12)
                                          Hash Key: catalog_sales.cs_ship_customer_sk
                                          ->  Result  (cost=0.00..33741.71 rows=1178712 width=12)
                                                ->  HashAggregate  (cost=0.00..33741.71 rows=1178712 width=12)
                                                      Group Key: catalog_sales.cs_ship_customer_sk
                                                      ->  Sequence  (cost=0.00..14132.41 rows=143998042 width=4)
                                                            ->  Partition Selector for catalog_sales (dynamic scan id: 2)  (cost=10.00..100.00 rows=10 width=4)
                                                                  Partitions selected: 80 (out of 80)
                                                            ->  Dynamic Seq Scan on catalog_sales (dynamic scan id: 2)  (cost=0.00..14132.41 rows=143998042 width=4)
    Optimizer: Pivotal Optimizer (GPORCA) version 3.80.0
   (34 rows)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 edited a comment on issue #2830: Support subquery in disjunction
URL: https://github.com/apache/incubator-doris/issues/2830#issuecomment-581809110
 
 
   Greenplum support the correlated subquery in disjunctive(OR) filter by GPORCA.
   
   If the correlated subquery in conjunctive filter, the query will be rewritten to semi join.
   If the correlated subquery in disjunctive filter, the query will be rewritten to left outer join in Greenplum.
   For example:
   select * from a where exist (select * from b where a.k1=b.k1) or exist (select * from c where a.k1=c.k1);
   1. a left outer join b on a.k1=b.k1
   2. the result of 1 left outer join c on a.k1=c.k1
   the result of 2 like that
   | a.k1 | b.k1 | c.k1 |
   | :-----| :---- | :----|
   | 1 | 1 | NULL |
   | 2 | NULL | 2 |
   | 3 | 3 | 3|
   | 4 | NULL | NULL|
   3. filter the incorrect line
   The first, second and third line should be keep.
   4. keep the outer table columns
   keep the columns in table A
   Here is a question. What does the following mean?
   `((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))`
   
   For query 
   ```
   SELECT *
   FROM customer c
   WHERE   EXISTS
           (
                 SELECT *
                 FROM web_sales
                 WHERE c.c_customer_sk = ws_bill_customer_sk
           )
           OR EXISTS
           (
                 SELECT *
                 FROM catalog_sales
                 WHERE c.c_customer_sk = cs_ship_customer_sk
   
           );
   ```
   The explain result like that 
   ```
   tpcds=# explain SELECT *
   tpcds-# FROM customer c
   tpcds-# WHERE   EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM web_sales
   tpcds(#               WHERE c.c_customer_sk = ws_bill_customer_sk
   tpcds(#         )
   tpcds-#         OR EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM catalog_sales
   tpcds(#               WHERE c.c_customer_sk = cs_ship_customer_sk
   tpcds(#
   tpcds(#         );
                                                                               QUERY PLAN
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Gather Motion 10:1  (slice3; segments: 10)  (cost=0.00..60357.42 rows=13500000 width=113)
      ->  Result  (cost=0.00..56302.64 rows=1350000 width=113)
            Filter: ((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))
            ->  Result  (cost=0.00..56118.17 rows=2803591 width=120)
                  ->  Hash Left Join  (cost=0.00..55781.73 rows=2803591 width=129)
                        Hash Cond: (customer.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                        ->  Hash Left Join  (cost=0.00..19740.78 rows=1613115 width=121)
                              Hash Cond: (customer.c_customer_sk = web_sales.ws_bill_customer_sk)
                              ->  Seq Scan on customer  (cost=0.00..514.82 rows=1200000 width=113)
                              ->  Hash  (cost=17560.85..17560.85 rows=1177761 width=12)
                                    ->  HashAggregate  (cost=0.00..17560.85 rows=1177761 width=12)
                                          Group Key: web_sales.ws_bill_customer_sk
                                          ->  Redistribute Motion 10:10  (slice1; segments: 10)  (cost=0.00..17410.01 rows=1177761 width=12)
                                                Hash Key: web_sales.ws_bill_customer_sk
                                                ->  Result  (cost=0.00..17365.77 rows=1177761 width=12)
                                                      ->  HashAggregate  (cost=0.00..17365.77 rows=1177761 width=12)
                                                            Group Key: web_sales.ws_bill_customer_sk
                                                            ->  Sequence  (cost=0.00..7559.00 rows=72000038 width=4)
                                                                  ->  Partition Selector for web_sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=10 width=4)
                                                                        Partitions selected: 56 (out of 56)
                                                                  ->  Dynamic Seq Scan on web_sales (dynamic scan id: 1)  (cost=0.00..7559.00 rows=72000038 width=4)
                        ->  Hash  (cost=33936.95..33936.95 rows=1178712 width=12)
                              ->  HashAggregate  (cost=0.00..33936.95 rows=1178712 width=12)
                                    Group Key: catalog_sales.cs_ship_customer_sk
                                    ->  Redistribute Motion 10:10  (slice2; segments: 10)  (cost=0.00..33785.99 rows=1178712 width=12)
                                          Hash Key: catalog_sales.cs_ship_customer_sk
                                          ->  Result  (cost=0.00..33741.71 rows=1178712 width=12)
                                                ->  HashAggregate  (cost=0.00..33741.71 rows=1178712 width=12)
                                                      Group Key: catalog_sales.cs_ship_customer_sk
                                                      ->  Sequence  (cost=0.00..14132.41 rows=143998042 width=4)
                                                            ->  Partition Selector for catalog_sales (dynamic scan id: 2)  (cost=10.00..100.00 rows=10 width=4)
                                                                  Partitions selected: 80 (out of 80)
                                                            ->  Dynamic Seq Scan on catalog_sales (dynamic scan id: 2)  (cost=0.00..14132.41 rows=143998042 width=4)
    Optimizer: Pivotal Optimizer (GPORCA) version 3.80.0
   (34 rows)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on issue #2830: Support subquery in disjunction

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on issue #2830: Support subquery in disjunction
URL: https://github.com/apache/incubator-doris/issues/2830#issuecomment-581809110
 
 
   Greenplum support the correlated subquery in disjunctive(OR) filter by GPORCA.
   
   If the correlated subquery in conjunctive filter, the query will be rewritten to semi join.
   If the correlated subquery in disjunctive filter, the query will be rewritten to left outer join in Greenplum.
   For example:
   select * from a where exist (select * from b where a.k1=b.k1) or exist (select * from c where a.k1=c.k1);
   1. a left outer join b on a.k1=b.k1
   2. the result of 1 left outer join c on a.k1=c.k1
   the result of 2 like that
   | a.k1 | b.k1 | c.k1 |
   | :-----| :---- | :----|
   | 1 | 1 | NULL |
   | 2 | NULL | 2 |
   | 3 | 3 | 3|
   | 4 | NULL | NULL|
   3. filter the incorrect line
   The first, second and third line should be keep.
   4. keep the outer table columns
   keep the columns in table A
   Here is a question. What does the following mean?
   `((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))`
   
   The explain result like that 
   ```
   tpcds=# explain SELECT *
   tpcds-# FROM customer c
   tpcds-# WHERE   EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM web_sales
   tpcds(#               WHERE c.c_customer_sk = ws_bill_customer_sk
   tpcds(#         )
   tpcds-#         OR EXISTS
   tpcds-#         (
   tpcds(#               SELECT *
   tpcds(#               FROM catalog_sales
   tpcds(#               WHERE c.c_customer_sk = cs_ship_customer_sk
   tpcds(#
   tpcds(#         );
                                                                               QUERY PLAN
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Gather Motion 10:1  (slice3; segments: 10)  (cost=0.00..60357.42 rows=13500000 width=113)
      ->  Result  (cost=0.00..56302.64 rows=1350000 width=113)
            Filter: ((COALESCE((count((count()))), 0::bigint) > 0::bigint) OR (COALESCE((count((count()))), 0::bigint) > 0::bigint))
            ->  Result  (cost=0.00..56118.17 rows=2803591 width=120)
                  ->  Hash Left Join  (cost=0.00..55781.73 rows=2803591 width=129)
                        Hash Cond: (customer.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                        ->  Hash Left Join  (cost=0.00..19740.78 rows=1613115 width=121)
                              Hash Cond: (customer.c_customer_sk = web_sales.ws_bill_customer_sk)
                              ->  Seq Scan on customer  (cost=0.00..514.82 rows=1200000 width=113)
                              ->  Hash  (cost=17560.85..17560.85 rows=1177761 width=12)
                                    ->  HashAggregate  (cost=0.00..17560.85 rows=1177761 width=12)
                                          Group Key: web_sales.ws_bill_customer_sk
                                          ->  Redistribute Motion 10:10  (slice1; segments: 10)  (cost=0.00..17410.01 rows=1177761 width=12)
                                                Hash Key: web_sales.ws_bill_customer_sk
                                                ->  Result  (cost=0.00..17365.77 rows=1177761 width=12)
                                                      ->  HashAggregate  (cost=0.00..17365.77 rows=1177761 width=12)
                                                            Group Key: web_sales.ws_bill_customer_sk
                                                            ->  Sequence  (cost=0.00..7559.00 rows=72000038 width=4)
                                                                  ->  Partition Selector for web_sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=10 width=4)
                                                                        Partitions selected: 56 (out of 56)
                                                                  ->  Dynamic Seq Scan on web_sales (dynamic scan id: 1)  (cost=0.00..7559.00 rows=72000038 width=4)
                        ->  Hash  (cost=33936.95..33936.95 rows=1178712 width=12)
                              ->  HashAggregate  (cost=0.00..33936.95 rows=1178712 width=12)
                                    Group Key: catalog_sales.cs_ship_customer_sk
                                    ->  Redistribute Motion 10:10  (slice2; segments: 10)  (cost=0.00..33785.99 rows=1178712 width=12)
                                          Hash Key: catalog_sales.cs_ship_customer_sk
                                          ->  Result  (cost=0.00..33741.71 rows=1178712 width=12)
                                                ->  HashAggregate  (cost=0.00..33741.71 rows=1178712 width=12)
                                                      Group Key: catalog_sales.cs_ship_customer_sk
                                                      ->  Sequence  (cost=0.00..14132.41 rows=143998042 width=4)
                                                            ->  Partition Selector for catalog_sales (dynamic scan id: 2)  (cost=10.00..100.00 rows=10 width=4)
                                                                  Partitions selected: 80 (out of 80)
                                                            ->  Dynamic Seq Scan on catalog_sales (dynamic scan id: 2)  (cost=0.00..14132.41 rows=143998042 width=4)
    Optimizer: Pivotal Optimizer (GPORCA) version 3.80.0
   (34 rows)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org