You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Daniel Wu (JIRA)" <ji...@apache.org> on 2011/08/15 15:07:27 UTC

[jira] [Created] (HIVE-2375) join multipe small tables with one big table in one mapside join?

join multipe small tables with one big table in one mapside join?
-----------------------------------------------------------------

                 Key: HIVE-2375
                 URL: https://issues.apache.org/jira/browse/HIVE-2375
             Project: Hive
          Issue Type: New Feature
          Components: Query Processor
         Environment: not related
            Reporter: Daniel Wu
            Priority: Minor


http://mail-archives.apache.org/mod_mbox/hive-user/201108.mbox/%3C130db22f.4dc7.131c2caf8d0.Coremail.hadoop_wu@163.com%3E

suppose we join 10 small tables (s1,s2...s10) with one huge table (big) in a data warehouse
system (the join is between big table and small tables, like star schema).  Is it possible to:
 first build 10 hash table: one for each small table,
and loop each row in the big table, if the row survive, just output, if not then discard,
in this way we only need to read the big data once, instead of read big data, write big data,
read big data, ...

dataflow is like:
1: build 10 hash tables
2: foreach row in big table
         probe the row with each of these 10 hash table
         if match all these 10 hash table, go to next step (output, etc)
         else discard the row.
    end loop


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira