You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2015/06/01 23:16:18 UTC

[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

     [ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang updated HIVE-10302:
-------------------------------
    Attachment: 10302.patch

Patch 10302 (without HIVE-) is the result of rebasing with latest master, which is actually committed to master.

> Load small tables (for map join) in executor memory only once [Spark Branch]
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-10302
>                 URL: https://issues.apache.org/jira/browse/HIVE-10302
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>             Fix For: spark-branch
>
>         Attachments: 10302.patch, HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch
>
>
> Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)