You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "gagan taneja (JIRA)" <ji...@apache.org> on 2015/01/17 03:12:34 UTC

[jira] [Created] (SPARK-5292) optimize join for table that are already sharded/support for hive bucket

gagan taneja created SPARK-5292:
-----------------------------------

             Summary: optimize join for table that are already sharded/support for hive bucket
                 Key: SPARK-5292
                 URL: https://issues.apache.org/jira/browse/SPARK-5292
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 1.2.0
            Reporter: gagan taneja


Currently join do not consider the locality of the data and perform the shuffle anyway
If the user takes the responsilbity of distributing the data based on some hash or shared the data, spark join should be able to leverage sharding to optimize join calculation/eliminate shuffle



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org