You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "gagan taneja (JIRA)" <ji...@apache.org> on 2015/01/17 03:12:34 UTC
[jira] [Created] (SPARK-5292) optimize join for table that are
already sharded/support for hive bucket
gagan taneja created SPARK-5292:
-----------------------------------
Summary: optimize join for table that are already sharded/support for hive bucket
Key: SPARK-5292
URL: https://issues.apache.org/jira/browse/SPARK-5292
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 1.2.0
Reporter: gagan taneja
Currently join do not consider the locality of the data and perform the shuffle anyway
If the user takes the responsilbity of distributing the data based on some hash or shared the data, spark join should be able to leverage sharding to optimize join calculation/eliminate shuffle
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org