You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2016/06/30 08:24:10 UTC

[jira] [Updated] (HAWQ-884) Subquery scan return no tuple in query with CTE

     [ https://issues.apache.org/jira/browse/HAWQ-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruilong Huo updated HAWQ-884:
-----------------------------
    Description: 
Here is the CTE query that return no tuple, while it should return 1 tuple actually.
{noformat}
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ), 
          t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);                                                                                                                                                                                            

-- Actual
c1 | c2
----+----
(0 rows)

-- Expected
 c1 | c2
----+----
  3 |  4
(1 row)
{noformat}

The root cause is that during query planning, it is correct that t2 in the CTE (common table expression) clause is treated as subquery scan and then materialized. However, during query execution, it generate no tuple when t2 is evaluated. Thus, the join of t1 with t2 generate no tuple. We can see this in query execution statistics using explain analyze while running the query with optimizer = off.
{noformat}
   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
{noformat}

Here is details:
1) hawq 2.0 with optimizer off (planner): subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 off
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6; optimizer=off
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.372 ms
(26 rows)
{noformat}
2) hawq 2.0 with optimizer on (orca): fallback to planner and thus subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 on
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.468 ms
(26 rows)
{noformat}
3) hawq 1.x with optimizer off (planner): subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 off
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..0.08 rows=4 width=8)
   Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
   ->  Append  (cost=0.00..0.04 rows=2 width=0)
         Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
 Total runtime: 0.315 ms
(24 rows)
{noformat}
4) hawq 1.x with optimizer on (orca): work as expected
{noformat}
show optimizer;
 optimizer
-----------
 on
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
  3 |  4
(1 row)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
   Hash Cond: "outer".c1 = "inner".c3
   Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
   Executor memory:  1K bytes.
   Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
   Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
   ->  Append  (cost=0.00..0.00 rows=2 width=8)
         Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
   ->  Hash  (cost=0.00..0.00 rows=1 width=4)
         Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=4)
               Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
 Slice statistics:
   (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Optimizer status: PQO version 1.591
 Total runtime: 2.572 ms
(28 rows)
{noformat}

  was:
Here is the CTE query that return no tuple, while it should return 1 tuple actually.
{noformat}
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),                                                                                                                                                                      t2 AS ( SELECT 3 c3 )                                                                                                                                                                                                             SELECT * FROM t1                                                                                                                                                                                                                       WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);                                                                                                                                                                                            

-- Actual
c1 | c2
----+----
(0 rows)

-- Expected
 c1 | c2
----+----
  3 |  4
(1 row)
{noformat}

The root cause is that during query planning, it is correct that t2 in the CTE (common table expression) clause is treated as subquery scan and then materialized. However, during query execution, it generate no tuple when t2 is evaluated. Thus, the join of t1 with t2 generate no tuple. We can see this in query execution statistics using explain analyze while running the query with optimizer = off.
{noformat}
   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
{noformat}

Here is details:
1) hawq 2.0 with optimizer off (planner): subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 off
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6; optimizer=off
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.372 ms
(26 rows)
{noformat}
2) hawq 2.0 with optimizer on (orca): fallback to planner and thus subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 on
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.468 ms
(26 rows)
{noformat}
3) hawq 1.x with optimizer off (planner): subquery scan generate no tuple
{noformat}
show optimizer;
 optimizer
-----------
 off
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..0.08 rows=4 width=8)
   Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
   ->  Append  (cost=0.00..0.04 rows=2 width=0)
         Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
 Total runtime: 0.315 ms
(24 rows)
{noformat}
4) hawq 1.x with optimizer on (orca): work as expected
{noformat}
show optimizer;
 optimizer
-----------
 on
(1 row)

WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
  3 |  4
(1 row)

EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
   Hash Cond: "outer".c1 = "inner".c3
   Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
   Executor memory:  1K bytes.
   Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
   Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
   ->  Append  (cost=0.00..0.00 rows=2 width=8)
         Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
   ->  Hash  (cost=0.00..0.00 rows=1 width=4)
         Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=4)
               Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
 Slice statistics:
   (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Optimizer status: PQO version 1.591
 Total runtime: 2.572 ms
(28 rows)
{noformat}


> Subquery scan return no tuple in query with CTE
> -----------------------------------------------
>
>                 Key: HAWQ-884
>                 URL: https://issues.apache.org/jira/browse/HAWQ-884
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Query Execution
>            Reporter: Ruilong Huo
>            Assignee: Lei Chang
>
> Here is the CTE query that return no tuple, while it should return 1 tuple actually.
> {noformat}
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ), 
>           t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);                                                                                                                                                                                            
> -- Actual
> c1 | c2
> ----+----
> (0 rows)
> -- Expected
>  c1 | c2
> ----+----
>   3 |  4
> (1 row)
> {noformat}
> The root cause is that during query planning, it is correct that t2 in the CTE (common table expression) clause is treated as subquery scan and then materialized. However, during query execution, it generate no tuple when t2 is evaluated. Thus, the join of t1 with t2 generate no tuple. We can see this in query execution statistics using explain analyze while running the query with optimizer = off.
> {noformat}
>    ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
>          Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
>          ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>                Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
>                ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
>                      ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                            One-Time Filter: 3 = $0
>                            Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
> {noformat}
> Here is details:
> 1) hawq 2.0 with optimizer off (planner): subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  off
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                                             QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.05..0.29 rows=72 width=8)
>    Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
>    ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>          Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
>          ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
>                Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      One-Time Filter: 3 = $0
>                      Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
>    ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
>          Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
>          ->  Append  (cost=0.00..0.04 rows=2 width=0)
>                Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  default_hash_table_bucket_number=6; optimizer=off
>  Optimizer status: legacy query optimizer
>  Data locality statistics:
>    no data locality information in this query
>  Total runtime: 0.372 ms
> (26 rows)
> {noformat}
> 2) hawq 2.0 with optimizer on (orca): fallback to planner and thus subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  on
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                                             QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.05..0.29 rows=72 width=8)
>    Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
>    ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>          Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
>          ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
>                Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      One-Time Filter: 3 = $0
>                      Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
>    ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
>          Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
>          ->  Append  (cost=0.00..0.04 rows=2 width=0)
>                Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  default_hash_table_bucket_number=6
>  Optimizer status: legacy query optimizer
>  Data locality statistics:
>    no data locality information in this query
>  Total runtime: 0.468 ms
> (26 rows)
> {noformat}
> 3) hawq 1.x with optimizer off (planner): subquery scan generate no tuple
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  off
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
> (0 rows)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                             QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..0.08 rows=4 width=8)
>    Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
>    ->  Append  (cost=0.00..0.04 rows=2 width=0)
>          Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
>          ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
>          ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
>    ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
>          Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
>          ->  Limit  (cost=0.00..0.00 rows=1 width=0)
>                Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
>                ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
>                      Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
>                      ->  Result  (cost=0.00..0.01 rows=1 width=0)
>                            One-Time Filter: 3 = $0
>                            Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 61K bytes.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Settings:  optimizer=off
>  Optimizer status: legacy query optimizer
>  Total runtime: 0.315 ms
> (24 rows)
> {noformat}
> 4) hawq 1.x with optimizer on (orca): work as expected
> {noformat}
> show optimizer;
>  optimizer
> -----------
>  on
> (1 row)
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>  c1 | c2
> ----+----
>   3 |  4
> (1 row)
> EXPLAIN ANALYZE
> WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
>      t2 AS ( SELECT 3 c3 )
> SELECT * FROM t1
> WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
>                                                QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
>    Hash Cond: "outer".c1 = "inner".c3
>    Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
>    Executor memory:  1K bytes.
>    Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
>    Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
>    ->  Append  (cost=0.00..0.00 rows=2 width=8)
>          Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=8)
>                Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=8)
>                Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
>    ->  Hash  (cost=0.00..0.00 rows=1 width=4)
>          Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
>          ->  Result  (cost=0.00..0.00 rows=1 width=4)
>                Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
>                ->  Result  (cost=0.00..0.00 rows=1 width=1)
>                      Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
>  Statement statistics:
>    Memory used: 128000K bytes
>  Optimizer status: PQO version 1.591
>  Total runtime: 2.572 ms
> (28 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)