You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2014/10/24 20:38:33 UTC

[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

     [ https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated HIVE-8597:
---------------------------------
    Attachment: HIVE-8597.1.patch

Patch to create one set of serialized splits for each bucket, and re-use them across tasks processing the same bucket. Also removes some unused variables, and cleans up variables to allow for GC.

[~vikram.dixit] - please review.

> SMB join small table side should use the same set of serialized payloads across tasks
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-8597
>                 URL: https://issues.apache.org/jira/browse/HIVE-8597
>             Project: Hive
>          Issue Type: Improvement
>          Components: Tez
>    Affects Versions: 0.14.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8597.1.patch
>
>
> Each task sees all splits belonging to the bucket being processed by the task. At the moment, we end up using different instances of the same serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)