You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/13 19:57:01 UTC
[jira] [Commented] (TAJO-1352) Improve the join order algorithm to
consider missed cases of associative join operators
[ https://issues.apache.org/jira/browse/TAJO-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584744#comment-14584744 ]
ASF GitHub Bot commented on TAJO-1352:
--------------------------------------
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/593#issuecomment-111736855
Here is the simple evaluation result.
### Query
```
select
count(*)
from
lineitem left outer join orders
on l_orderkey = o_orderkey
left outer join partsupp
on ps_suppkey = o_custkey
left outer join customer
on ps_suppkey = c_custkey
left outer join part
on p_partkey = c_nationkey
```
### Data
* TPC-H of scale factor 100
### Cluster
* One master and 3 workers
* Each worker equips 4 cores, 8 GB memory, and 2 HDDs.
### Performance comparison
* Elapsed time
* Before patch: 25 mins, 23 sec
* After patch: 19 mins, 59 sec
* Performance improvement ratio: about 20%
### Query plan
The query execution time is reduced due to the improved query plan as follows.
#### Before patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 1.0447965953264456E46)
> Optimized join order : ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 1.0447965859707236E46)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) = tpch100.customer.c_nationkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8), tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand (TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), tpch100.part.p_comment (TEXT)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) = tpch100.customer.c_custkey (INT8)
=> target list: tpch100.customer.c_nationkey (INT8)
=> out schema: {(1) tpch100.customer.c_nationkey (INT8)}
=> in schema: {(3) tpch100.partsupp.ps_suppkey (INT8), tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)
=> out schema: {(2) tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8), tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) = tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(2) tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8), tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) = tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.orders.o_custkey (INT8)
=> out schema: {(1) tpch100.orders.o_custkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8), tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8), tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8), tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount (FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag (TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate (DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate (DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode (TEXT), tpch100.lineitem.l_comment (TEXT)}
```
#### After patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 7.933924122078524E46)
> Optimized join order : ((tpch100.lineitem ⟕ (tpch100.orders ⟕ tpch100.partsupp)) ⟕ (tpch100.customer ⟕ tpch100.part)) (cost: 4.016549062824562E47)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) = tpch100.customer.c_custkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.partsupp.ps_suppkey (INT8), tpch100.customer.c_custkey (INT8)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) = tpch100.customer.c_nationkey (INT8)
=> target list: tpch100.customer.c_custkey (INT8)
=> out schema: {(1) tpch100.customer.c_custkey (INT8)}
=> in schema: {(3) tpch100.customer.c_custkey (INT8), tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8), tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand (TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), tpch100.part.p_comment (TEXT)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_custkey (INT8), tpch100.customer.c_nationkey (INT8)
=> out schema: {(2) tpch100.customer.c_custkey (INT8), tpch100.customer.c_nationkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8), tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) = tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8), tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) = tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
=> in schema: {(3) tpch100.orders.o_orderkey (INT8), tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8), tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_orderkey (INT8), tpch100.orders.o_custkey (INT8)
=> out schema: {(2) tpch100.orders.o_orderkey (INT8), tpch100.orders.o_custkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8), tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8), tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount (FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag (TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate (DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate (DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode (TEXT), tpch100.lineitem.l_comment (TEXT)}
```
> Improve the join order algorithm to consider missed cases of associative join operators
> ---------------------------------------------------------------------------------------
>
> Key: TAJO-1352
> URL: https://issues.apache.org/jira/browse/TAJO-1352
> Project: Tajo
> Issue Type: Improvement
> Components: planner/optimizer
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Attachments: JoinGraph.png
>
>
> TAJO-1277 fixes a bug related to the associativity of join operators, but there are still some missed cases that join operators are associative. This work should include the cases described in the following links:
> * http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative
> * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)