You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/06/27 09:00:33 UTC

[jira] [Created] (TAJO-893) Shared data flow should be supported.

Hyunsik Choi created TAJO-893:
---------------------------------

             Summary: Shared data flow should be supported.
                 Key: TAJO-893
                 URL: https://issues.apache.org/jira/browse/TAJO-893
             Project: Tajo
          Issue Type: Sub-task
          Components: DAG
            Reporter: Hyunsik Choi


Please see the following example (TPC-H Q2). This query uses 5 relation joins twice in the scalar subquery and the outer query block. If DAG framework support a shared data channel and we reuse the result of 5 relation joins, the query can avoids duplicated scans, data shuffles, and joins.

For this feature, first of all, we should support multiple output data channel. In addition, we should support shared data channel to transmission the same intermediate data without duplicated shuffles.

Please see also TAJO-161. TAJO-161 would make good use of this feature.

{code}
select
  s_acctbal,
  s_name,
  n_name,
  p_partkey,
  p_mfgr,
  s_address,
  s_phone,
  s_comment
from
  part,
  supplier,
  partsupp,
  nation,
  region
where
  p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p_size = 15
  and p_type like '%BRASS'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'c'
  and ps_supplycost =
    (
      select min(ps_supplycost) from partsupp, supplier, nation, region
      where 
	      p_partkey = ps_partkey
	      and s_suppkey = ps_suppkey
	      and s_nationkey = n_nationkey
	      and n_regionkey = r_regionkey
	      and r_name = 'EUROPE'
    )
order by 
  s_acctbal desc, 
  n_name, 
  s_name, 
  p_partkey
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)