You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Liyin Tang (JIRA)" <ji...@apache.org> on 2010/11/13 02:16:15 UTC

[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

    [ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931599#action_12931599 ] 

Liyin Tang commented on HIVE-1642:
----------------------------------

I just finished converting common join into map join based on the file size.  There are 2 flags to control this optimization.
1)	set hive.auto.convert.join = true; It means this optimization is enabled. By default right now, this flag is disabled in order not to break any existing test cases. Also I put 25 additional test cases, auto_join0.q - auto_join25.q, which covers this optimization code.
2)	Set hive.hashtable.max.memory.usage = 0.9;  It means if the memory usage of local task is more than 90% of its heap size, then the local task will abort by itself. The Driver will know the local work fails and it won't submit the MapJoinTask (a Map Only MapRedTask)  to Hadoop, but instead, it will submit the originally CommonJoinTask to Hadoop to run.
3)	Set hive.smalltable.filesize = 25000000L;  It means if the summary of the small table file size is less than 25M, then it will run the map join task. If not, just run the originally common join task.
 The following is the basic flow how it works. For each common join, create a conditional task.
1)	For each join table, generate a mapjoin task by assuming this table is big table. 
a.	The left side of right outer join must be small table.
b.	The right side of left outer join must be small table.
c.	No full outer join can be optimized. 
d.	Eg. A left outer join B right outer join C. Only C can be big table table.
e.	Eg. A right outer join B left outer join C. Only B can be big table table.
f.	Eg. A left outer join B left outer join C. Only A can be big table table.
g.	Eg. A right outer join B right outer join C. Both B and C can be big table table.
2)	Put all these generated map join tasks into conditional task and set the mapping between big table's alias with the corresponding map join task.
3)	During the execution time, the resolver will read the input file size. If the input file size of small table is less than a threshold, than run the converted map join task. 
4)	Set each map join task with a backup task. The backup task is the originally common join task.
This mapping relationship is set during execution time.
5)	If the map join task return abnormally, launch the backup task.



> Convert join queries to map-join based on size of table/row
> -----------------------------------------------------------
>
>                 Key: HIVE-1642
>                 URL: https://issues.apache.org/jira/browse/HIVE-1642
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.