You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (JIRA)" <ji...@apache.org> on 2014/08/18 12:17:19 UTC

[jira] [Created] (OPTIQ-379) Alternative implementation of semi-join: both sides should be considered for building a map

Vladimir Sitnikov created OPTIQ-379:
---------------------------------------

             Summary: Alternative implementation of semi-join: both sides should be considered for building a map
                 Key: OPTIQ-379
                 URL: https://issues.apache.org/jira/browse/OPTIQ-379
             Project: Optiq
          Issue Type: New Feature
            Reporter: Vladimir Sitnikov
            Assignee: Julian Hyde


When implementing semi-join, one can build a map from either of two inputs (see \[1\]).
In general it looks to be more efficient to build a map over a smaller input, thus avoiding materialization of a large input.

Consider the following query:
{code:sql}select * from "hr"."emps"
where exists (
  select 1 from "hr"."depts" where "depts"."deptno" = "emps"."deptno");{code}

There is a trade-off (assuming semi-join is used, assuming no spill-to-disk happens):
1) If semi-join is implemented as BuildMap(Scan(depts)) and scan through emps, the map will take {{count(distinct depts.deptno)\*(map_entry_overhead + avg_size_of_deptno_column)}} bytes

2) If semi-join is implemented as BuildMap(Scan(emps)) and scan through depts, then the map would take {{count(emps.\*)\*(map_entry_overhead + avg_size_of_emps_row)}} bytes

The same applies to anti-joins.

\[1\]: [Semi-join orientation|http://mail-archives.apache.org/mod_mbox/optiq-dev/201408.mbox/browser]



--
This message was sent by Atlassian JIRA
(v6.2#6252)