You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wangshisan <gi...@git.apache.org> on 2018/10/09 02:57:32 UTC

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

Github user wangshisan commented on the issue:

    https://github.com/apache/spark/pull/21156
  
    What is the status now? I think this is of great value, since this gives users more possibility to leverage bucket join, all joins which take the bucket key as the prefix of join keys will benefit from this. 
    And we have a further optimization here:
    1. Table A(a1, a2, a3) is bucketed by a1, a2
    2. Table B(b1, b2, b3) is bucketed by b1.
    3. A join B on (a1=b1, a2=b2, a3=b3)
    
    In this case, only table B needs extra shuffle, and shuffle keys are (b1, b2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org