You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/12/17 07:35:46 UTC

[jira] [Closed] (SPARK-11512) Bucket Join

     [ https://issues.apache.org/jira/browse/SPARK-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin closed SPARK-11512.
-------------------------------
    Resolution: Duplicate

> Bucket Join
> -----------
>
>                 Key: SPARK-11512
>                 URL: https://issues.apache.org/jira/browse/SPARK-11512
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Cheng Hao
>
> Sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, and we don't need to sort it again while join with the sorted/partitioned keys
> This functionality exists in
> - Hive (hive.optimize.bucketmapjoin.sortedmerge)
> - Pig (USING 'merge')
> - MapReduce (CompositeInputFormat)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org