You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/06/27 09:00:33 UTC
[jira] [Created] (TAJO-893) Shared data flow should be supported.
Hyunsik Choi created TAJO-893:
---------------------------------
Summary: Shared data flow should be supported.
Key: TAJO-893
URL: https://issues.apache.org/jira/browse/TAJO-893
Project: Tajo
Issue Type: Sub-task
Components: DAG
Reporter: Hyunsik Choi
Please see the following example (TPC-H Q2). This query uses 5 relation joins twice in the scalar subquery and the outer query block. If DAG framework support a shared data channel and we reuse the result of 5 relation joins, the query can avoids duplicated scans, data shuffles, and joins.
For this feature, first of all, we should support multiple output data channel. In addition, we should support shared data channel to transmission the same intermediate data without duplicated shuffles.
Please see also TAJO-161. TAJO-161 would make good use of this feature.
{code}
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'c'
and ps_supplycost =
(
select min(ps_supplycost) from partsupp, supplier, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)