You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2016/06/30 08:25:10 UTC

[jira] [Assigned] (HAWQ-884) Subquery scan return no tuple in query with CTE

     [ https://issues.apache.org/jira/browse/HAWQ-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruilong Huo reassigned HAWQ-884:
--------------------------------

    Assignee: Ruilong Huo  (was: Lei Chang)

> Subquery scan return no tuple in query with CTE
> -----------------------------------------------
>
>                 Key: HAWQ-884
>                 URL: https://issues.apache.org/jira/browse/HAWQ-884
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Query Execution
>            Reporter: Ruilong Huo
>            Assignee: Ruilong Huo
>
> Here is the CTE query that return no tuple, while it should return 1 tuple actually.
> {noformat}
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ), 
>       t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);                                                                                                                                                                                            
> -- Actual
> c1 | c2
> ----+----
> (0 rows)
> -- Expected
>  c1 | c2
> ----+----
>   3 |  4
> (1 row)
> {noformat}
> The root cause is that during query planning, it is correct that t2 in the CTE (common table expression) clause is treated as subquery scan and then materialized.
> However, during query execution, it generate no tuple when t2 is evaluated. Thus, the join of t1 with t2 generate no tuple.
> We can see this in query execution statistics using explain analyze while running the query with optimizer = off.
> {noformat}
>    ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
>          Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
>          ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>                Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
>                ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
>                      ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                            One-Time Filter: 3 = $0
>                            Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
> {noformat}
> Here is details:
> 1) hawq 2.0 with optimizer off (planner): subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  off
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                                             QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.05..0.29 rows=72 width=8)
>    Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
>    ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>          Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
>          ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
>                Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      One-Time Filter: 3 = $0
>                      Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
>    ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
>          Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
>          ->  Append  (cost=0.00..0.04 rows=2 width=0)
>                Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  default_hash_table_bucket_number=6; optimizer=off
>  Optimizer status: legacy query optimizer
>  Data locality statistics:
>    no data locality information in this query
>  Total runtime: 0.372 ms
> (26 rows)
> {noformat}
> 2) hawq 2.0 with optimizer on (orca): fallback to planner and thus subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  on
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                                             QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.05..0.29 rows=72 width=8)
>    Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
>    ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>          Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
>          ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
>                Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      One-Time Filter: 3 = $0
>                      Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
>    ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
>          Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
>          ->  Append  (cost=0.00..0.04 rows=2 width=0)
>                Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  default_hash_table_bucket_number=6
>  Optimizer status: legacy query optimizer
>  Data locality statistics:
>    no data locality information in this query
>  Total runtime: 0.468 ms
> (26 rows)
> {noformat}
> 3) hawq 1.x with optimizer off (planner): subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  off
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                             QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..0.08 rows=4 width=8)
>    Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
>    ->  Append  (cost=0.00..0.04 rows=2 width=0)
>          Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
>          ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
>          ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
>    ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
>          Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
>          ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>                Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
>                ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
>                      ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                            One-Time Filter: 3 = $0
>                            Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  optimizer=off
>  Optimizer status: legacy query optimizer
>  Total runtime: 0.315 ms
> (24 rows)
> {noformat}
> 4) hawq 1.x with optimizer on (orca): work as expected
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  on
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
>   3 |  4
> (1 row)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
>    Hash Cond: "outer".c1 = "inner".c3
>    Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
>    Executor memory:  1K bytes.
>    Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
>    Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
>    ->  Append  (cost=0.00..0.00 rows=2 width=8)
>          Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=8)
>                Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=8)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
>    ->  Hash  (cost=0.00..0.00 rows=1 width=4)
>          Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=4)
>                Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Optimizer status: PQO version 1.591
>  Total runtime: 2.572 ms
> (28 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)