You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org> on 2010/12/07 12:13:11 UTC

[jira] Updated: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

     [ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HIVE-1695:
-----------------------------------------

    Attachment: hive-1695.patch

Attaching initial version of the patch. Patch currently does not have unit tests. I have manually tested some queries with mapjoin and order by, group by and filter operator code. All of the above three seems passes.

The code changes which I have seems to be final. I have not removed the current NodeProcessor Code, instead added a new file. This way the review of the core logic can be easier. I will be removing the same and uploading a patch once the approach in the code seems fine along with unit test case.


The code currently does not cover the case of joins being converted to mapjoins automatically as done in HIVE-1642. Should we address that in this issue or as a separate issue?

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
>                 Key: HIVE-1695
>                 URL: https://issues.apache.org/jira/browse/HIVE-1695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>         Attachments: hive-1695.patch
>
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map only job followed by a Map-Reduce job. It can be combined into single MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.